Menu

The 100-year-old mistake we are still making in experiments

Mar 5, 2024

Statistical significance at P<0.05 is an arbitrary 1925 standard that's rejecting winning marketing experiments. Learn why treating p-values as binary instead of a spectrum is costing you conversions and how to make smarter data-driven decisions.

Statistical significance is very often misused in marketing experiments. 100 years ago someone wrote that we should all follow the magical P-value <0.05 and we are all just still rolling with it. Yes, really. Ronald Fischer wrote in his book "Statistical Methods for Research Workers" in 1925:

"Personally, the writer prefers to set a low standard of significance at the 5 percent point, and ignore entirely all results which fail to reach this level."

And that's it! Some gentleman from 1925 liked <0.05 (<5%) as the ideal value and told us to ignore any data that does not meet that threshold.

How marketing got trapped by a century-old preference

I've seen tons of marketing hypotheses rejected because the data was "insignificant." In other words, not meeting that golden standard of P < 0.05. It's even built in as a standard for experiments in ad platforms like Google Ads.

The problem is that it has become binary, and it is binary on a quite arbitrary cut-off point. Instead, it should be a spectrum of how certain you are about your results. Borrowing from the American Statistical Association, would you not take a decision to deploy your experiment if it won, because there's a 6% probability (not stat sig) versus a 4% probability (stat sig) that your treatment has no effect at all?

The reality is that there's very little difference between P=0.04 and P=0.06. Yet good experiment outcomes are killed in marketing because they are not meeting P<0.05.

Context matters more than rigid thresholds

Sure, you want to be really, really certain about your experiment when you develop a new medicine to be sure that it's not going to actually kill people. But senator, we run ads.

It seems that even the statistical community isn't agreeing on what P-value means, so I'll borrow my definitions from the American Statistical Association: P-value of <0.05 is used to determine that there's a less than 5% chance your treatment has no effect. In scientific jargon we call this rejecting the null hypothesis, your null hypothesis being that your treatment has no effect at all. If you get a P-value of 0.1, there's a 10% chance that your treatment has no effect.

The publication bias problem

For example, scientific journals are much more likely to publish statistically significant results. In medical research, billions of dollars of sales may ride on whether a drug shows statistically significant benefits or not. A result which does not show the proper significance can ruin months or years of work, and might inspire desperate attempts to 'encourage' the desired outcome.

Fischer never meant it as gospel

The irony is that Fischer himself viewed p<0.05 as a rough guideline for further investigation, not a hard decision threshold. The marketing world has turned his personal preference into gospel, often without considering effect size, business impact, or the broader context of the experiment.