Why Your A/B Test Is Lying to You
You’ve just launched an A/B test. Version B is winning! Time to roll it out to everyone, right?
Not so fast.
The Sample Size Trap
Here’s a question: If version B is actually twice as good as version A, how often will your A/B test correctly identify it as the winner?
100% of the time? 90%? Surely at least 80%?
Try the interactive simulator below to find out. Spoiler alert: you might be shocked.
What This Means for Your Business
When you run A/B tests with small sample sizes (which is most startups and many established businesses), you’re essentially flipping a weighted coin. Even when there’s a real difference, your test might point you in the wrong direction.
This is why:
- Sample size matters more than test duration - Running a test for 3 months doesn’t help if you only get 50 trials per month
- Statistical significance isn’t magic - That “95% confidence” threshold assumes you have enough data
- False winners cost real money - Every time you pick the wrong variant, you’re leaving money on the table
The Math (For the Curious)
The simulator uses binomial probability distributions to calculate the exact likelihood of each outcome. For larger sample sizes (>1000 trials), it switches to Monte Carlo simulation for performance.
The key insight: even with a 2x improvement (0.01 → 0.02 conversion rate), you need hundreds of trials before you can reliably detect the difference.
What You Should Do
- Calculate required sample size before running your test
- Don’t stop early just because one variant is winning
- Consider Bayesian methods for ongoing optimization instead of fixed-duration tests
- Be honest about power - if you don’t have enough traffic, you don’t have enough traffic
The simulator above lets you play with different scenarios. Try setting version B to be just 10% better than A, or reducing your monthly traffic. You’ll quickly see why so many A/B tests lead to wrong conclusions.
Remember: in A/B testing, like in life, the most dangerous mistake is the one you don’t know you’re making.