Sykes Cottages · 2023–2024 · AI Infrastructure & Knowledge Management

AI-Powered Insight Hub

Turning scattered organizational knowledge into a searchable, cited intelligence layer — reducing discovery time from 7 days to 20 minutes

7d→20m
Discovery time reduced
1,200+
Experiments synthesized
200
UXR studies indexed

The Challenge

Every experimentation program generates knowledge. The question is whether that knowledge compounds or disappears.

At Sykes, we were running hundreds of experiments annually across multiple brands and product squads. Each test generated insights — some conclusive, some directional, some dead ends that prevented harmful changes. This should have made the organisation smarter with every test. Instead, it created a different problem: knowledge access.

The problem surfaced in a pattern I kept seeing in practice: a product manager would ask 'have we ever tested checkout layout changes?' and the answer required coordinating across teams, manually searching Slack threads, digging through experiment dashboards, and cross-referencing UX research folders. The process took 30–45 minutes minimum — and still left uncertainty about whether the complete picture had been found.

This wasn't a knowledge gap. It was a knowledge access problem. Decision-makers weren't acting on what the organisation already knew because finding what we knew took longer than making a gut call.

The stakes: at 300+ annual tests, even a 10% reduction in re-work compounds significantly over time. But the deeper cost was strategic. When institutional knowledge is hard to access, teams either re-test old ideas or avoid bold hypotheses entirely because they can't build confidence from past learnings.

The Approach

I set out to build a single intelligence layer that made the organisation's complete knowledge base accessible through a single natural language question.

The core architecture used Azure OpenAI for synthesis, Power Automate for orchestration, and intelligent multi-source routing. The system synthesised data from 1,000+ experiments and 200 UX research studies, enabling natural language queries across all sources simultaneously.

Two design decisions made the difference in adoption:

1. Outputs were delivered as fully cited HTML reports — not raw data dumps. Every answer included sources, context, and related findings. This meant decision-makers got answers with provenance, not just data fragments they had to interpret.

2. I built a meta-analysis layer alongside the query system. This compared new experiment results against historical patterns to identify trends across themes, areas, and user segments — transforming isolated test learnings into compounding institutional knowledge.

The implementation ran through a phased rollout: first indexing the existing experiment archive, then integrating UX research, then adding automated meta-analysis as experiments concluded.

Azure OpenAI Power Automate GPT Synthesis Multi-source Routing HTML Reporting Natural Language Queries

The Results

7 days → 20 min
Discovery time reduced from manual coordination to instant synthesis
1,200+
Experiments made queryable via natural language
200
UXR studies synthesized into knowledge base
80%+
Confidence threshold for query routing

The quantitative impact was immediate: discovery time dropped from 7+ days (coordinating teams, manual collation, pattern analysis) to 15–20 minutes through conversational queries.

Decision-makers stopped asking 'I wonder if we know anything about this' and started asking 'what do we know about this?' — a subtle shift in language that reflected a fundamental shift in how knowledge was treated.

The system also had an unexpected cultural benefit: it reduced the pressure on the CRO team as the institutional memory holder. Product managers, designers, and engineers could now self-serve on historical context, which freed the CRO team to focus on forward-looking work rather than constantly answering 'did we test this before?' questions.

What Made This Work

The technology was table stakes. The insight was in how the system was framed. An AI search tool for experiments would have failed adoption. A conversational intelligence layer that answered strategic questions with cited evidence became indispensable.

The 80% confidence threshold before routing was also a hard-won design principle. Early tests with lower thresholds produced too many 'I'm not sure' responses, which eroded trust. Setting the bar at 80% meant the system only answered questions it could genuinely synthesize — and simply said 'no match' when it couldn't, rather than guessing.

If I were to rebuild this today, I'd instrument adoption tracking earlier — understanding which queries drove the most value would have helped prioritize additional data source integrations faster.

The transferable principle: institutional knowledge is an asset that degrades if it isn't systematically accessible. The value of an experimentation program isn't just in the tests you run — it's in whether each test makes the organization permanently smarter.

Want to build something similar?

Book a 30-minute conversation →