The history of this standard dates back to Ronald Fisher, an English statistician, evolutionary biologist, mathematician, geneticist, and eugenicist. He developed important techniques and concepts, some of which you may use daily: the ANOVA, F-distributions, Fisher’s method for meta-analyses, inverse (or Bayesian) probability, permutation testing, and, yes, the p<0.05 standard for statistical significance. In addition to his contributions to statistics, Fisher also introduced the diverse concepts of allele dominance in genetics, heterozygote advantage, and the Sexy Son Hypothesis (look that one up!). Though many other brilliant mathematicians and statisticians also contributed to these concepts, Fisher’s work likely has the greatest influence on our modern research practices.
Fisher established the use of p<0.05 in his 1925 book, Statistical Methods for Research Workers (SMRW). And if you are interested in a little light reading, you can check it out here. If you don’t have the weekend to devour the entire text, this is the line where he introduces the infamous 0.05:
“The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty.” – Ronald Fisher, 1925
I think Fisher actually meant for us to take p=0.05 as a convenient indicator: Look more here! There might be something interesting! [I like to picture little cells with faces, waving hi!…] Instead, many researchers use this benchmark to assess a study’s value to the field.
I completely know what she means. Is the data normally distributed? Who cares, I just want to see if it “worked”!!
Importantly, Maria noted that, “There is a distinction that should be made between "statistical significance" and "practical significance" in any statistical analysis result, as well as the "effect size," a unit-less metric that reflects the strength of association between two variables. A good scientist will always consider all of these, because you can achieve statistical significance with an effect size that is too small to be of practical significance, simply as a result of having an enormous sample.”
On the flip side, she said that it is possible to have a clinically important effect size, but the study may have too few subjects to reach statistical significance. With these pitfalls in mind, Maria provided me with helpful guidelines for any scientist performing experiments and analyzing results. So take a breath, press pause on the t-test function (I know, it’s hard!), and take the following steps:
1. Prior to beginning the study,
2. Prior to the study and during the study,
3. When you first obtain the data,
- For that odd data point that just confuses you: “The lone outlier data point might be your only representative data point of a more rare, but still possible and perhaps important finding.”
- For the data that is clustered around two values: “Does it follow a normal distribution or does it maybe need a log transformation in order to correct for skewness?”
- For the empty rows in your excel spreadsheet: “A non-random pattern of missingness poses problems in the interpretation of the statistical parameter estimates you obtain, and the scientist must carefully decide what to do about it.” This is particularly true for surveys and clinical procedures.
4. When you have your final results,
5. When sharing the study’s findings with others,
Maria’s final words of advice may be the most essential. “A good scientist will spend some time thinking about her personal biases that affect what she believes about her research, her scientific theory, how the results should turn out, and what conclusions she should draw from her findings.”
My final take-away: Less focus on reaching statistical significance and more focus on the process is essential for high quality science. Your results may just surprise you.
And this just cracks me up!