A/B Test Sample Size Calculator
Calculate the required sample size for statistically significant A/B test results. Get accurate test duration estimates and power analysis.
Quick Start Presets
Test Configuration
Current conversion rate of your control group
Relative improvement you want to detect (e.g., 20% = 5% → 6%)
Including control (2 = A/B test, 3 = A/B/C test)
Statistical Parameters
Probability of false positive (Type I error)
Probability of detecting real effect (1 - Type II error)
Traffic Information (Optional)
Percentage of traffic included in the test
Expected Results
Error Rates
Sample Collection Timeline
Recommendations
Sample Size vs Statistical Power
Sample Size vs Minimum Detectable Effect
How Sample Size is Calculated
Sample size calculation for A/B testing uses statistical formulas to determine how many observations you need to detect a meaningful difference between variations with confidence:
n = [(Zα/2 + Zβ)2 × 2p(1-p)] / (p1 - p2)2Where:
- Zα/2 = Z-score for significance level (e.g., 1.96 for 95%)
- Zβ = Z-score for statistical power (e.g., 0.842 for 80%)
- p = pooled conversion rate
- p1, p2 = baseline and expected conversion rates
Key Concepts Explained
Type I Error (False Positive)
The probability of concluding there's a difference when there isn't one. Typically set to 5% (95% confidence level).
Type II Error (False Negative)
The probability of missing a real difference. With 80% power, this is 20%.
Statistical Power
The probability of detecting a real effect. Higher power (80-90%) reduces false negatives but requires more samples.
Minimum Detectable Effect
The smallest improvement you want to reliably detect. Smaller effects require larger sample sizes.
Frequently Asked Questions
What sample size do I need for an A/B test?
It depends on your baseline conversion rate, the minimum improvement you want to detect, and your desired confidence level. Typically, you need 1,000-5,000 samples per variation for most tests.
What is statistical power and why does it matter?
Statistical power (typically 80%) is the probability of detecting a real improvement when it exists. Higher power reduces false negatives but requires more samples.
How long should I run my A/B test?
Run your test for at least 1-2 weeks to capture weekly patterns, and until you reach the required sample size. Ending early increases false positive risk.
What if I don't have enough traffic?
You have three options: (1) increase the minimum detectable effect (test larger changes), (2) reduce statistical power slightly (70% instead of 80%), or (3) run the test longer.
Should I test multiple variations at once?
Testing multiple variations (A/B/C) requires more traffic. Stick to A/B tests unless you have very high traffic, or use sequential testing.
What is a good minimum detectable effect (MDE)?
For most tests, 10-20% relative improvement is realistic. Smaller MDEs (5%) require massive sample sizes. Larger MDEs (30%+) are easier to detect but may miss smaller wins.
Related Calculators
Maximize your insights by using these complementary calculators together