Sykes Cottages · UK's Largest Holiday Let Agency · 2022–Present · Programme Design & Scaling

Scaling Experimentation at Sykes

Building three interconnected systems that enabled 300+ annual tests while improving quality — proving velocity and rigor can compound, not compete

300+
Annual tests (top 10%)
+31%
Success rate improvement
£1.5m
Incremental revenue

The Challenge

Velocity without foundations isn't a programme. It's a queue of bets.

When I joined Sykes, there was an active CRO programme in place — but it had stalled in a pattern common to many maturing teams. The ideas being generated were disconnected from evidence. Tests were running, but the programme wasn't learning fast enough from them: success rates had declined year-on-year, hypotheses lacked evidential grounding, and insights from completed tests weren't systematically feeding back into future ideation.

The team was doing the work but not building the compounding system that makes experimentation genuinely scalable — where each test makes the organisation smarter, not just busier.

The challenge I set myself: build the foundations that make high velocity sustainable — not as a trade-off against quality, but as a system where velocity and quality reinforce each other.

The Approach

I worked across three interconnected layers — each building on the last.

Layer 1 — Fixing How Ideas Were Generated: The Momentum Framework

The root cause of declining success rates wasn't execution — it was ideation. Teams were jumping to solutions before understanding problems properly.

I designed the Momentum Framework — a structured two-session process separating evidence-building from ideation, preventing teams from jumping to solutions before problems were properly understood.

Session one was findings-first. Cross-functional participants from design, engineering, product, and marketing reviewed what we'd observed across UX research, experiment analysis, and business insights. From observations, teams drafted assumptions (why are we seeing this?) and early recommendations (how might we address it?). These inputs were combined into pending hypotheses — structured problem statements without solutions attached.

Session two was ideation. With evidential foundations already laid, participants could think creatively. The only guiding question: 'what would tell you your idea had worked?' This anchored creativity to measurable outcomes. After each session, ideas were collated and success metrics assigned — completing the hypothesis architecture: 'We observed X from source Y, and therefore believe if we do Z, we would expect change to metric M.'

The resulting hypothesis architecture had eight components: insight → source → change → users → action → metrics → lag metrics.

Layer 2 — Fixing How Experiments Were Decided: Pre-Decision Analysis System

As the programme scaled, experiment decision-making had become a bottleneck. Each outcome required bespoke analysis commentary taking 3–5 days per test — and with hundreds running annually, this created friction.

I built a Tableau report pulling page funnel metrics automatically from the data warehouse using experiment IDs. Paired this with a structured Excel-based analysis layer containing pre-built formulas for confidence intervals, margin of error, and guardrail metric checks. Primary, secondary, and guardrail metrics were defined upfront for every test, removing ambiguity from the decision itself.

The result: retrospective analysis delays were eliminated. Decisions that took 3–5 days now took under an hour. Beyond speed, this had cultural impact — successful experiments shipped faster, which built momentum and confidence in the programme.

Layer 3 — Fixing How Knowledge Persisted: AI Infrastructure & Automated Stakeholder Reporting

A programme running 300+ annual tests generates significant institutional knowledge — but only if that knowledge remains accessible and stakeholders stay aligned without manual overhead.

Built automated workflow system monitoring experiment status and triggering contextually appropriate updates. Stakeholders were alerted when an experiment reached a decision-relevant threshold, not on a schedule.

This eliminated the peeking effect (stakeholders reviewing preliminary results before statistical significance) while scaling awareness of experimentation beyond the core CRO team to UX designers and developers building the variants.

Tableau Excel (statistical formulas) Power Automate Azure OpenAI HTML Reporting

The Results

+31%
Success rate improvement — from better prioritisation, not risk aversion
£1.5m
Incremental revenue generated through the programme
20+
Stakeholders across 5 squads kept informed automatically
300+
Annual tests delivered without proportional overhead increase

The 31% success rate improvement deserves context. A rising success rate can be a warning sign — it can mean teams are playing it safe. In this case, the improvement came from filtering low-conviction ideas faster, not from avoiding bold hypotheses. The Momentum Framework meant ideas entering the test queue were already evidence-anchored, which increased the probability they'd move metrics when shipped.

Across 5 product squads, 20+ stakeholders were kept informed automatically. The peeking effect was structurally reduced, protecting statistical integrity. UX and engineering teams were brought into the experimentation feedback loop for the first time — creating a direct connection between the people building solutions and the results those solutions generated.

What Made This Work

The insight that underpins this work is simple: velocity without foundations creates noise. Foundations without velocity create stagnation. High-performing programmes build both simultaneously.

The sequencing mattered too. Fixing ideation first (Momentum Framework) meant that the experiments entering the queue were higher quality from the start. That improvement cascaded: better hypotheses → better test design → higher success rates → faster decisions → more capacity to onboard additional squads.

If I were building this again, I'd instrument learning velocity — the time from experiment conclusion to insight application in future tests. Speed-to-insight is a better leading indicator of programme health than success rate or test volume alone.

Transferable principle: Experimentation programmes don't fail from lack of tools or talent. They fail from lack of systems. Build the ideation system, the decision system, and the knowledge system — and velocity becomes a compounding asset instead of a coordination tax.

Want to build something similar?

Book a 30-minute conversation →