We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners who may combine it with other information that you’ve provided to them or that they’ve collected from your use of their services.

Statistical significance in eCommerce testing: Why your A/B tests might be wrong

By
Dan Bond
July 15, 2025
4 mins

Are you stopping your A/B tests too early? Are you calling winners when you shouldn't?

Do you actually understand what 95% confidence means?

Are you making business decisions based on mathematical theater?

Most eCommerce marketers get statistical significance wrong. And it's costing them money.

What is statistical significance?

Statistical significance tells you one thing: whether your results are likely due to chance or something real. That's it.

It doesn't tell you:

  • How important the result is
  • Whether you should implement the change
  • If the effect will last
  • Whether it's worth the effort

This matters for eCommerce CRO because you're not just testing random changes. You're testing discounts and promotions that directly impact revenue. You're optimizing shopping cart flows and customer experience elements that affect your bottom line.

Getting significance wrong means implementing changes that don't actually improve conversions. (Or worse, missing improvements that actually work.)

The 95% confidence myth

95% confidence doesn't mean you're 95% sure your result is correct.

It means if you ran this exact test 100 times, 95 times you'd get a result this extreme or more extreme.

(Yes, that's confusing. No, most people don't understand this.)

Reality check: Your "winning" variation might still be worse than your control.

False positives: The expensive mistake

False positives are also known as Type 1 errors.

Here you falsely reject a null hypothesis. You saw a connection between change and impact where there was none.

What causes false positives:

  • Ending your test too early
  • Peeking at results repeatedly
  • Using a statistical significance threshold that is too low
  • Testing during unusual traffic periods

Among other A/B testing mistakes like ending your test too early and peeking, using a statistical significance threshold that is too low contributes to causing this type of error.

When you're testing promotions that incentivize purchases, this false positive rate becomes expensive. You might roll out a discount strategy that actually reduces profit margins without improving conversions.

How sample size destroys most tests

Most eCommerce tests are underpowered.

If your conversion rate is 2% and you want to detect a 10% improvement:

  • You need ~39,000 visitors per variation
  • For 95% confidence and 80% power
  • That's 78,000 total visitors

Your 500-visitor test isn't telling you anything useful.

The math that matters:

Sample size requirements depend on:

  • Expected conversion rate
  • Minimum detectable effect
  • Significance level (usually 5%)
  • Statistical power (usually 80%)

This is particularly problematic for retail businesses testing personalization features or promotion optimization. Small improvements in these areas can generate significant revenue increases, but they require massive sample sizes to prove statistical significance.

Common mistakes that kill results

What works:

  • Wait for sufficient sample size
  • Pre-determine test duration
  • Account for seasonality
  • Test one variable at a time

What doesn't work:

  • Peeking at results daily
  • Stopping tests early because you see significance
  • Ignoring practical significance
  • Testing during major sales events

The biggest mistake? Stopping tests the moment you hit 95% confidence.

This matters for eCommerce platforms because customer behavior changes throughout the week, month, and season. A promotion that appears to work on Tuesday might fail on Friday when different customer segments visit your site.

When to ignore statistical significance

The business trumps the math.

Stop your test if:

  • You're losing significant revenue
  • The result is practically meaningless
  • External factors have changed
  • You've reached your pre-planned duration

Continue testing if:

  • Results are trending toward significance
  • Business impact is meaningful
  • You haven't hit minimum sample size
  • Test duration was too short

Real example: An eCommerce company tested a new shopping cart abandonment email sequence. After two weeks, the test showed 94% confidence (not quite significant). But the new sequence was generating $50,000 more revenue per week.

They implemented it anyway. (Good call.)

Practical significance vs statistical significance

Statistical significance: The math says it's real.

Practical significance: The business says it matters.

A 0.01% improvement in conversion rate might be:

  • Statistically significant with enough traffic
  • Practically worthless for your business

The question that matters: Is this improvement worth implementing?

For eCommerce businesses, this calculation depends on:

  • Revenue impact per conversion
  • Implementation costs
  • Opportunity cost of not testing something else
  • Long-term customer value implications

What to do with inconclusive results

Not every test is a winner. That's why they're called tests.

When your test doesn't reach significance:

  • Document what you learned
  • Analyze why it failed
  • Consider testing a bigger change
  • Move on to the next test

Important: Inconclusive results aren't failures. They're data points that prevent you from implementing changes that don't work.

Building a testing culture that works

Beyond the numbers, successful eCommerce CRO programs focus on learning, not just winning.

What successful teams do:

  • Set clear success metrics upfront
  • Plan test duration in advance
  • Document all results (winners and losers)
  • Focus on learning, not just winning

What kills testing programs:

  • Obsessing over P-values
  • Stopping tests for convenience
  • Testing too many variations
  • Ignoring practical implications

What P-values actually mean

P-values tell you the probability of seeing your results (or more extreme results) if there's actually no difference between your variations.

A p-value of 0.05 means there's a 5% chance your results happened by random luck alone.

(Not that there's a 5% chance your hypothesis is wrong. That's different.)

Reality check:

  • P-value of 0.03 = 3% chance of seeing this result if nothing changed
  • P-value of 0.08 = 8% chance of seeing this result if nothing changed
  • P-value of 0.50 = 50% chance of seeing this result if nothing changed

Lower p-values suggest your change actually did something.

Higher p-values suggest you're probably looking at noise.

The magic number is usually 0.05, but that's just convention. (Not mathematical law.)

Industry-specific considerations

eCommerce testing has unique challenges:

  • Seasonality affects significance: Black Friday traffic behaves differently than January traffic. Your promotion optimization tests need to account for these patterns.
  • Cart abandonment introduces bias: If you're testing checkout flow improvements, cart abandonment rates can skew results. A test that appears to improve conversions might just be capturing customers who would have completed purchases anyway.
  • Personalization complexity: Testing personalized experiences requires segmented analysis. Overall statistical significance might hide the fact that your changes only work for specific customer groups.

Key takeaways

Statistical significance is a tool, not a goal. Sample size matters more than you think.

Business impact trumps mathematical purity. Most tests should run longer than you want.

(And yes, you probably need more traffic than you have.)

The goal isn't to achieve statistical significance. The goal is to improve your business through better customer experiences, optimized promotions, and increased conversions.

Statistical significance just helps you know when you've found something real.

Use it correctly, and your eCommerce CRO efforts will generate more revenue with less wasted effort. Use it incorrectly, and you'll implement changes that don't actually work.

The math is important. The business impact is what matters.

eCommerce Funnel Optimization Guide