Scaling Experimentation to 300+ annual tests

The Challenge

Velocity without foundations isn't a programme. It's a queue of bets.

When I joined Sykes, there was an active CRO programme in place but it had stalled in a pattern common to many maturing teams. The ideas being generated were disconnected from evidence. Tests were running, but the programme wasn't learning fast enough from them: success rates had declined year-on-year and insights from completed tests weren't feeding back into future ideation effectively.

The team was doing the work but not building the system that makes experimentation genuinely scalable - where each test makes the organisation smarter, not just busier.

The challenge I set myself: build the foundations that make high velocity sustainable and not as a trade-off against quality

The Approach

I worked across three interconnected layers — each building on the last.

Layer 1 - Fixing How Ideas Were Generated: The Momentum Framework

The root cause of declining success rates wasn't execution — it was ideation. Teams were jumping to solutions before understanding problems properly.

I designed the Momentum Framework — a structured two-session process separating evidence-building from ideation, preventing teams from jumping to solutions before problems were properly understood.

Session one was findings-first. Cross-functional participants from design, engineering and product, reviewed what we'd observed across UX research, experiment analysis, and business insights. From observations, teams drafted assumptions (why are we seeing this?) and early recommendations (how might we address it?). These inputs were combined into pending hypotheses which structured the problem statements without attaching any solutions.

Session two was ideation. With evidence already laid, participants could think creatively. The only guiding question: 'what would tell you your idea had worked?' This anchored creativity to measurable outcomes. After each session, ideas were collated and success metrics assigned which completed the hypothesis architecture: 'We observed X from source Y, and therefore believe if we do Z, we would expect change to metric M.'

The resulting hypothesis architecture had eight components: insight → source → change → users → action → metrics → lag metrics.

Layer 2 - Fixing How Experiments Were Decided: Pre-Decision Analysis System

As the programme scaled, experiment decision-making had become a bottleneck. Each outcome required bespoke analysis commentary taking 3–5 days per test and with hundreds running annually, this created friction.

I built a Tableau report pulling page funnel metrics automatically from the data warehouse using experiment IDs. Paired this with a structured Excel-based analysis layer containing pre-built formulas for confidence intervals, margin of error, and guardrail metric checks. Primary, secondary, and guardrail metrics were defined upfront for every test, removing ambiguity from the decision itself.

The result: retrospective analysis delays were eliminated. Decisions that took 3–5 days now took under an hour. Beyond speed, this had cultural impact as successful experiments shipped faster, which built momentum and confidence in the programme.

Layer 3 - Fixing How Knowledge Persisted: AI Infrastructure & Automated Stakeholder Reporting

A programme running 300+ annual tests generates significant institutional knowledge but only if that knowledge remains accessible and stakeholders stay aligned without manual overhead.

Built automated workflow system monitoring experiment status and triggering appropriate updates. Stakeholders were alerted when an experiment reached a decision-relevant threshold, not on a schedule.

This eliminated the peeking effect (stakeholders reviewing early results before statistical significance) while scaling awareness of experimentation beyond the core CRO team to UX designers and developers building the variants.

Tableau Excel (statistical formulas) Power Automate Azure OpenAI HTML Reporting

The Results

+31%

Success rate improvement from better prioritisation, not risk aversion

£1.5m

Incremental revenue generated through the programme

20+

Stakeholders across 5 squads kept informed automatically

300+

Annual tests delivered without proportional overhead increase

The 31% success rate improvement deserves context. A rising success rate can be a sign that teams are playing it safe. In this case, the improvement came from filtering low-conviction ideas faster, not from avoiding bold hypotheses. The Momentum Framework meant ideas entering the test queue were already evidence-anchored, which increased the probability they'd move metrics when shipped.

Across 5 product squads, 20+ stakeholders were kept informed automatically. The peeking effect was structurally reduced, protecting statistical integrity. UX and engineering teams were brought into the experimentation feedback loop which created a direct connection between the people building solutions and the results those solutions generated.

What Made This Work

The insight that underpins this work is simple: velocity without foundations creates noise. Foundations without velocity create stagnation. High-performing programmes build both simultaneously.

The sequencing mattered too. Fixing ideation first (Momentum Framework) meant that the experiments entering the queue were higher quality from the start. That improvement cascaded: better hypotheses → better test design → higher success rates → faster decisions → more capacity to onboard additional squads.

If I were building this again, I'd also track learning velocity — the time from experiment conclusion to insight application in future tests. Speed-to-insight is a good leading indicator of programme health instead of success rate or test volume alone.

Transferable principle: Experimentation programmes don't fail from lack of tools or talent. They fail from lack of systems. Build the ideation system, the decision system, and the knowledge system and velocity becomes a compounding asset instead of a tax on quality.

Scaling Experimentation at Sykes

The Challenge

The Approach

The Results

What Made This Work

Want to build something similar?

Scaling Experimentation at Sykes

The Challenge

The Approach

The Results

What Made This Work

Want to build something similar?

Related Case Studies