A/B Test Sample Size Calculator

Calculate the required sample size for statistically significant A/B test results. Get accurate test duration estimates and power analysis.

Quick Start Presets

Test Configuration

Current conversion rate of your control group

Relative improvement you want to detect (e.g., 20% = 5% → 6%)

Including control (2 = A/B test, 3 = A/B/C test)

Statistical Parameters

Probability of false positive (Type I error)

Probability of detecting real effect (1 - Type II error)

Traffic Information (Optional)

Percentage of traffic included in the test

Sample Size Per Variation
8,162
Total: 16,324 samples
17
Days
3
Weeks
1
Months

Expected Results

Baseline CR:5.00%
Expected New CR:6.00%
Absolute Improvement:+1.00 pp
Relative Improvement:+20%

Error Rates

Type I Error (α):5.0%
Type II Error (β):20.0%
Statistical Power:80.0%

Sample Collection Timeline

Recommendations

Optimal test duration (2-4 weeks). This captures weekly patterns while maintaining statistical rigor.
Large sample size ensures high confidence in results. Great for critical business decisions.

Sample Size vs Statistical Power

Sample Size vs Minimum Detectable Effect

How Sample Size is Calculated

Sample size calculation for A/B testing uses statistical formulas to determine how many observations you need to detect a meaningful difference between variations with confidence:

n = [(Zα/2 + Zβ)2 × 2p(1-p)] / (p1 - p2)2

Where:

  • Zα/2 = Z-score for significance level (e.g., 1.96 for 95%)
  • Zβ = Z-score for statistical power (e.g., 0.842 for 80%)
  • p = pooled conversion rate
  • p1, p2 = baseline and expected conversion rates

Key Concepts Explained

Type I Error (False Positive)

The probability of concluding there's a difference when there isn't one. Typically set to 5% (95% confidence level).

Type II Error (False Negative)

The probability of missing a real difference. With 80% power, this is 20%.

Statistical Power

The probability of detecting a real effect. Higher power (80-90%) reduces false negatives but requires more samples.

Minimum Detectable Effect

The smallest improvement you want to reliably detect. Smaller effects require larger sample sizes.

Frequently Asked Questions

What sample size do I need for an A/B test?

It depends on your baseline conversion rate, the minimum improvement you want to detect, and your desired confidence level. Typically, you need 1,000-5,000 samples per variation for most tests.

What is statistical power and why does it matter?

Statistical power (typically 80%) is the probability of detecting a real improvement when it exists. Higher power reduces false negatives but requires more samples.

How long should I run my A/B test?

Run your test for at least 1-2 weeks to capture weekly patterns, and until you reach the required sample size. Ending early increases false positive risk.

What if I don't have enough traffic?

You have three options: (1) increase the minimum detectable effect (test larger changes), (2) reduce statistical power slightly (70% instead of 80%), or (3) run the test longer.

Should I test multiple variations at once?

Testing multiple variations (A/B/C) requires more traffic. Stick to A/B tests unless you have very high traffic, or use sequential testing.

What is a good minimum detectable effect (MDE)?

For most tests, 10-20% relative improvement is realistic. Smaller MDEs (5%) require massive sample sizes. Larger MDEs (30%+) are easier to detect but may miss smaller wins.