The Complete Guide to A/B Testing Sample Size (2026)

Running A/B tests without the right sample size is like flipping a coin twice and declaring the result meaningful. You need enough data to confidently know if your variation actually performs better—or if you're just seeing random noise.

This complete guide will teach you everything you need to know about sample size calculation, from basic concepts to advanced techniques. By the end, you'll be able to confidently determine exactly how many visitors you need for reliable test results.

Try Our Free Sample Size Calculator →

Why Sample Size Matters

The Cost of Getting It Wrong

Too Small: Stop your test too early, and you might:

Implement a "winning" variation that actually performs worse
Miss detecting a real improvement
Waste development time on false positives
Make decisions based on random chance, not real data

Too Large: Run your test too long, and you'll:

Waste weeks waiting for unnecessary data
Delay shipping improvements
Incur opportunity costs
Lose to faster competitors

The sweet spot? Just enough samples to detect meaningful differences with statistical confidence.

Key Statistical Concepts

Statistical Significance (α)

Statistical significance is your acceptable risk of a false positive—declaring a winner when there isn't one.

Standard: 95% confidence (α = 0.05)

This means:

95% confident the difference is real
5% chance you're seeing random variation
Industry standard for most A/B tests

When to adjust:

99% confidence (α = 0.01): Critical changes (checkout flow, pricing)
90% confidence (α = 0.10): Exploratory tests, smaller decisions

Higher confidence requires larger sample sizes but reduces false positives.

Statistical Power (1-β)

Power is your ability to detect a real difference when one exists—avoiding false negatives.

Standard: 80% power (β = 0.20)

This means:

80% chance of detecting a real improvement
20% risk of missing a genuine effect
Widely accepted in scientific research

When to increase:

90% power: Important business decisions
95% power: Mission-critical changes

Higher power requires more samples but ensures you don't miss real improvements.

Minimum Detectable Effect (MDE)

MDE is the smallest improvement you want to detect. It's often the most overlooked—and most important—input.

Setting realistic MDE:

10-20% relative improvement: Most common and achievable
5-10%: Requires very large samples
20%+: Easier to detect, might miss smaller wins

Example:

Baseline: 2.5% conversion rate
20% relative improvement: 3.0% (2.5% × 1.20)
Absolute improvement: 0.5 percentage points

Be realistic. Expecting 50%+ improvements leads to underpowered tests that run forever.

Baseline Conversion Rate

Your current conversion rate determines required sample size. Lower baselines need more samples.

Why it matters:

1% → 1.2% (20% improvement): ~24,000 samples per variation
5% → 6% (20% improvement): ~4,800 samples per variation
10% → 12% (20% improvement): ~2,400 samples per variation

Lower conversion rates have more variance, requiring larger samples to detect changes confidently.

The Sample Size Formula

Here's the complete formula for calculating sample size:

n = [Z_α/2 + Z_β]² × 2p(1-p) / (p₁ - p₂)²

Where:

n = Required sample size per variation
Z_α/2 = Z-score for significance level (1.96 for 95% confidence)
Z_β = Z-score for power (0.84 for 80% power)
p = Baseline conversion rate
p₁ - p₂ = Absolute difference (MDE)

Z-Score Table:

Confidence	α	Z_α/2
90%	0.10	1.645
95%	0.05	1.960
99%	0.01	2.576

Power	β	Z_β
70%	0.30	0.524
80%	0.20	0.842
90%	0.10	1.282
95%	0.05	1.645

Example Calculation

Inputs:

Baseline conversion: 2.5%
MDE: 20% relative (3.0% absolute = 0.5 percentage points)
Confidence: 95% (Z = 1.96)
Power: 80% (Z = 0.84)

Calculation:

p = 0.025
p₁ - p₂ = 0.005
n = [1.96 + 0.84]² × 2(0.025)(0.975) / (0.005)²
n = [2.8]² × 0.04875 / 0.000025
n = 7.84 × 0.04875 / 0.000025
n = 15,288 samples per variation

Total needed: 30,576 samples (both variations)

Or just use our calculator and skip the math: Sample Size Calculator →

Step-by-Step: Calculate Your Sample Size

Step 1: Measure Baseline Conversion Rate

Use at least 1 week of historical data:

Account for day-of-week patterns
Avoid holiday periods (unless testing seasonal changes)
Ensure traffic is stable and representative

Example: 250 conversions / 10,000 visitors = 2.5% baseline

Step 2: Set Your Minimum Detectable Effect

Ask yourself: "What improvement would be worth implementing?"

Guidelines:

New test hypothesis: Start with 10-20% relative improvement
Refinement test: Can detect smaller (5-10%)
Revolutionary change: Might see 30%+ improvement

Example: 20% improvement on 2.5% = 3.0% target (0.5pp absolute)

Step 3: Choose Significance Level

Standard: 95% confidence

Adjust based on risk tolerance:

Critical path (checkout, pricing): 99%
Standard test (copy, design): 95%
Exploration (minor tweaks): 90%

Step 4: Set Statistical Power

Standard: 80% power

Increase for important tests:

Business-critical decisions: 90%
Major investments: 95%

Higher power = more samples but less risk of missing real effects.

Step 5: Account for Multiple Variations

Testing more than 2 variations? Apply Bonferroni correction to maintain overall significance:

Adjusted α = α / number of comparisons

Example: 3 variations (A vs B, A vs C, B vs C = 3 comparisons)

Original α: 0.05
Adjusted α: 0.05 / 3 = 0.0167
Use 98.3% confidence instead of 95%

This increases required sample size but prevents false positives from multiple testing.

Step 6: Calculate Test Duration

Test Duration = (Sample Size × Variations) / Daily Traffic

Example:

Sample size needed: 15,288 per variation
Variations: 2
Daily traffic: 500 visitors
Duration: (15,288 × 2) / 500 = 61 days

Important: Round up and run for complete weeks (14, 21, 28 days) to account for weekly patterns.

Real-World Examples

Example 1: E-commerce Product Page

Scenario: Testing new product page layout

Inputs:

Current conversion: 3.2%
MDE: 15% relative improvement (3.68%)
Confidence: 95%
Power: 80%
Variations: 2 (A/B test)
Daily traffic: 800 visitors

Results:

Sample size: 8,141 per variation
Total needed: 16,282 visitors
Test duration: 21 days (at 800/day)

Decision: Run test for 3 full weeks.

Example 2: SaaS Signup Flow

Scenario: Streamlining multi-step signup

Inputs:

Current conversion: 8.5%
MDE: 12% relative improvement (9.52%)
Confidence: 95%
Power: 90% (important change)
Variations: 2
Daily traffic: 300 visitors

Results:

Sample size: 3,622 per variation
Total needed: 7,244 visitors
Test duration: 25 days

Decision: Run for 4 weeks to ensure complete week coverage.

Example 3: Email Subject Line

Scenario: A/B/C test of 3 subject lines

Inputs:

Current open rate: 22%
MDE: 10% relative (24.2%)
Confidence: 95%
Power: 80%
Variations: 3 (A/B/C)
List size: 50,000

With Bonferroni correction:

Adjusted confidence: 98.3% (α = 0.0167)
Sample size: 4,982 per variation
Total needed: 14,946 emails

Decision: Send to 15,000 random subset, analyze after 24 hours.

Special Considerations

Multiple Variations (Bonferroni Correction)

When testing 3+ variations, you make multiple comparisons:

3 variations = 3 comparisons (A vs B, A vs C, B vs C)
4 variations = 6 comparisons
5 variations = 10 comparisons

Formula: k(k-1)/2 where k = number of variations

Each comparison increases false positive risk. Bonferroni correction divides your significance level by the number of comparisons.

Impact: Testing 4 variations instead of 2 can increase required sample size by 50-70%.

Recommendation: Limit to 2-3 variations per test. Run sequential tests instead of testing everything at once.

Low-Traffic Websites

Problem: Required sample size might mean 6+ month test duration.

Solutions:

1. Increase MDE: Accept detecting larger effects only

Instead of 10% improvement, test for 25%
Reduces sample size by ~85%

2. Test higher-traffic pages:

Homepage instead of specific product page
Signup flow instead of settings page

3. Combine traffic sources:

Test across multiple similar pages
Aggregate data carefully

4. Use sequential testing:

Check results at predetermined intervals
Stop early if strong signal emerges
Requires specialized statistical methods

5. Accept longer test durations:

Ensure test runs full weeks (1-2 minimum)
Monitor for external changes during test

Multi-Page Funnels

Testing a funnel with multiple steps requires different approach:

Problem: Conversion rate decreases at each step

Landing page: 10,000 visitors
Step 2: 3,000 (30% conversion)
Step 3: 1,200 (40% of step 2)
Purchase: 300 (25% of step 3)
Overall: 3% conversion

Solution: Calculate based on the final conversion rate (3%), not individual steps.

Sample size amplifies through the funnel—you need more top-funnel traffic than you might expect.

Mobile vs. Desktop

Different devices often have different conversion rates and user behavior:

Approach 1: Segment analysis

Run test on all traffic
Analyze mobile and desktop separately post-test
Requires larger sample size

Approach 2: Separate tests

Run mobile-specific and desktop-specific tests
Allows device-optimized variations
Doubles traffic requirements

Recommendation: Start with combined test, segment analysis. Run separate tests only if device behavior differs dramatically.

Common Mistakes to Avoid

1. The Peeking Problem

Mistake: Checking results daily and stopping when p-value < 0.05

Why it's wrong: The more you check, the higher your false positive rate. You'll eventually see p < 0.05 by random chance.

Fix:

Commit to sample size upfront
Check results only at predetermined milestones
Use sequential testing methods if you must peek
Wait for full sample size before declaring winner

2. Ignoring Weekly Patterns

Mistake: Running test for exactly 10 days or stopping mid-week

Why it's wrong:

Weekend traffic behaves differently than weekday
Your test might catch 2 Saturdays but only 1 Friday
Creates sampling bias

Fix:

Always run tests in complete weeks (7, 14, 21, 28 days minimum)
Start tests on the same day of week you plan to end them
Minimum 1-2 full weeks even if you hit sample size sooner

3. Testing Too Many Variations

Mistake: Running A/B/C/D/E test with 5 variations

Why it's wrong:

Requires Bonferroni correction
Massively increases sample size (often 3-4x)
Splits traffic too thinly
Each variation gets fewer samples

Fix:

Limit to 2-3 variations maximum
Run sequential tests instead
Focus on testing one hypothesis at a time
Use multivariate testing only when necessary and with huge traffic

4. Setting Unrealistic MDE

Mistake: Expecting 50%+ improvements as minimum

Why it's wrong:

Makes tests too easy to "pass"
Requires tiny sample sizes
Might miss smaller but still valuable 10-15% wins

Fix:

Be realistic: most winning tests improve 5-20%
Set MDE based on business value, not wishful thinking
Consider: would a 10% improvement be worth implementing?

5. Stopping Tests Early

Mistake: Stopping at 80% of required sample because results look good

Why it's wrong:

Underpowered test
Higher false positive rate
Results likely to regress to mean

Fix:

Commit to full sample size before starting
Set calendar reminders for proper end date
If you must end early, acknowledge increased error risk

Sample Size Benchmarks by Industry

Based on analysis of thousands of A/B tests:

E-commerce

Metric	Typical Range	Required Sample	Test Duration*
Product page CR	2-5%	10,000-20,000	14-30 days
Add to cart rate	10-20%	3,000-6,000	7-14 days
Checkout completion	40-60%	1,000-2,000	3-7 days

*Assuming 1,000 visitors/day

SaaS

Metric	Typical Range	Required Sample	Test Duration*
Signup conversion	5-15%	3,000-8,000	10-20 days
Free to paid	2-5%	10,000-20,000	30-60 days
Feature adoption	20-40%	1,500-3,000	5-10 days

*Assuming 500 visitors/day

B2B/Lead Gen

Metric	Typical Range	Required Sample	Test Duration*
Form submission	3-8%	5,000-15,000	20-40 days
Demo request	1-3%	15,000-40,000	60-120 days
Content download	10-25%	2,000-5,000	7-15 days

*Assuming 300 visitors/day

Media/Publishing

Metric	Typical Range	Required Sample	Test Duration*
Click-through rate	5-15%	3,000-8,000	2-5 days
Email signup	2-6%	8,000-20,000	5-12 days
Video completion	30-60%	1,000-2,000	1-3 days

*Assuming 2,500 visitors/day

Note: These are starting points. Your actual requirements depend on your specific MDE, confidence, and power settings.

Advanced Topics

Sequential Testing (SPRT)

What it is: Statistical method allowing you to peek at results and stop early while maintaining proper error rates.

How it works:

Set error boundaries that tighten over time
Can stop when result crosses boundary
Maintains alpha and beta levels

Benefits:

Can reduce test duration by 20-50%
Safe to monitor continuously
Faster decisions on clear winners/losers

Tradeoffs:

More complex to implement
May run longer if results are ambiguous
Requires specialized tools

Tools: Optimizely, VWO, and custom implementations

Bayesian A/B Testing

Difference from frequentist: Instead of "Is there a difference?", asks "What's the probability variation B is better?"

Benefits:

More intuitive interpretation
Can incorporate prior knowledge
Continuous monitoring without peeking penalty
Direct probability statements

Tradeoffs:

Requires setting priors
Can be seen as subjective
Less standardized than frequentist

Sample sizes: Generally similar to frequentist, sometimes smaller with strong priors.

Multi-Armed Bandit

What it is: Adaptive testing that automatically shifts traffic to better-performing variations.

How it works:

Starts with even split
Gradually allocates more traffic to winners
Minimizes exposure to losers

When to use:

Testing many variations (3+)
Cost of showing losing variation is high
Traffic is very high
Willing to sacrifice some statistical rigor for practical gains

Not recommended for:

Low traffic sites
Testing two variations
When you need definitive answers

Tools & Resources

Sample Size Calculators

Our recommendation: WMMW Sample Size Calculator

Clean interface
All standard options
Power analysis charts
Test duration estimates
Multiple variation support

Alternative options:

Evan Miller's calculator (simple, accurate)
Optimizely's calculator (built-in if using their platform)
Google Optimize calculator (deprecated but still referenced)

A/B Testing Platforms

Enterprise:

Optimizely: Full-featured, expensive, great for large teams
VWO: Mid-market, good features, reasonable pricing
Adobe Target: Enterprise-only, deep Analytics integration

Mid-Market:

Convert: Privacy-focused, good for GDPR compliance
AB Tasty: French company, strong in Europe
Kameleoon: AI-powered, advanced segmentation

Budget/Small Teams:

Google Optimize (discontinued)
Microsoft Clarity + custom implementation
Open source: Growthbook, Unleash

Statistical Resources

Books:

"Trustworthy Online Controlled Experiments" by Kohavi, Tang, Xu
"A/B Testing: The Most Powerful Way to Turn Clicks Into Customers" by Siroker & Koomen

Online Courses:

Udacity: A/B Testing by Google
CXL: Advanced A/B Testing & Experimentation

Conclusion

Sample size calculation isn't guesswork—it's statistics. And getting it right is the difference between making data-driven decisions and random ones.

Key Takeaways:

Don't wing it: Calculate required sample size before starting any test
Be realistic: Set achievable MDEs (10-20% improvements)
Commit to the number: Don't stop early or peek continuously
Account for patterns: Run tests in complete weeks (minimum 1-2)
Use tools: Let calculators do the math for you

Ready to calculate your sample size?

Use Our Free Sample Size Calculator →

It takes 30 seconds and ensures your next test produces reliable, actionable results.

Stop guessing. Start testing with confidence.

Related Resources:

Conversion Rate Calculator - Calculate your baseline
ROI Calculator - Estimate value of improvements
Marketing ROI Calculator - Track test ROI

Questions? Drop a comment below or contact our analytics team.

The Complete Guide to A/B Testing Sample Size (2026)

The Complete Guide to A/B Testing Sample Size (2026)

Why Sample Size Matters

The Cost of Getting It Wrong

Key Statistical Concepts

Statistical Significance (α)

Statistical Power (1-β)

Minimum Detectable Effect (MDE)

Baseline Conversion Rate

The Sample Size Formula

Example Calculation

Step-by-Step: Calculate Your Sample Size

Step 1: Measure Baseline Conversion Rate

Step 2: Set Your Minimum Detectable Effect

Step 3: Choose Significance Level

Step 4: Set Statistical Power

Step 5: Account for Multiple Variations

Step 6: Calculate Test Duration

Real-World Examples

Example 1: E-commerce Product Page

Example 2: SaaS Signup Flow

Example 3: Email Subject Line

Special Considerations

Multiple Variations (Bonferroni Correction)

Low-Traffic Websites

Multi-Page Funnels

Mobile vs. Desktop

Common Mistakes to Avoid

1. The Peeking Problem

2. Ignoring Weekly Patterns

3. Testing Too Many Variations

4. Setting Unrealistic MDE

5. Stopping Tests Early

Sample Size Benchmarks by Industry

E-commerce

SaaS

B2B/Lead Gen

Media/Publishing

Advanced Topics

Sequential Testing (SPRT)

Bayesian A/B Testing

Multi-Armed Bandit

Tools & Resources

Sample Size Calculators

A/B Testing Platforms

Statistical Resources

Conclusion

Share this insight

Ready to Transform Your Analytics?