Is the Threshold for Statistical Significance Too High?

Author(s):

Science has begun to lean on P values as the gold standard for study results, which has led to the misconception that statistical significance equals truth.

John P. A. Ioannidis, MD, DSc

As scientific data becomes more complex and complicated, the scientific community’s ability to interpret them does as well.

In a recently published editorial, John P. A. Ioannidis, MD, DSc, the C.F. Rehnborg Chair in Disease Prevention and a professor of medicine and of Health Research and Policy at Stanford University, made the case for lowering the threshold for statistical significance from P = .05 to P = .005, in order to improve the research used to make scientific decisions.

He did, however, caution that it is not a permanent solution. The problems associated with the current threshold for P values are widely agreed upon, he said, however, the solution is not an easy one to decipher.

“P values are widely misinterpreted, overtrusted, and misused, so it is a compounded problem with multiple layers, rather than just one issue,” Ioannidis told MD Magazine. “If I had to pick the biggest issue within this conundrum, it is probably the wide misconception that ‘P <0.05’ is equated with ‘true’,” he said.

This lowering would result in a damming of the incoming flood of statistically significant data, which in turn could promote better science with more durable resolutions.

“I think it will do more good than harm. It is not a panacea and it is only a temporizing measure, but it will shift millions of results from ‘significant’ to merely ‘suggestive.’ Most, not all of these shifts, would be desirable,” he said.

One of the major issues, Ioannidis said, is that most modern study design is suboptimal, and can allow those with a bias to build trials that inevitably lead to the desired solution. This, he said, is due to the trend of research leaning on P values—research is now being done to achieve statistically significant results, which are not always useful results.

“This process leads to distorted design and bias and even lack of any design at all,” he said. “Most studies are very small and thus underpowered, although we increasingly see also very large studies—big data—where the problem is that the quality of the data is very poor, and P values would be very misleading.”

Ioannidis called back to the American Statistical Association’s (ASA) 2016 statement on P values, which he credited with enabling “the dissection of these 3 problems.”

The ASA statement proposed these 6 principles:

P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a P-value passes a specific threshold.
Proper inference requires full reporting and transparency.
A P -value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a P-value does not provide a good measure of evidence regarding a model or hypothesis.

In an editorial comment accompanying Ioannidis’s, Demetrios N. Kyriacou, MD, PhD, agreed, noting that “[Ronald] Fisher’s purpose was not to use the P value as a decision-making instrument but to provide researchers with a flexible measure of statistical inference within the complex process of scientific inference,” a point that the ASA attempted to drive home in the third of its 6 principles.

The value of P values has been overstated in modern research, and is causing “major trouble,” Ioannidis said. As far as the adoption of this change, the change will need to take place in multiple fields for it to take root. Ioannidis recommended the use of other inferential tools to limit the dependence on P values, notably Bayesian statistics, when appropriate.

While changes may be met with low embracement initially and possibly cause some pessimism, the shift to best practices is “always possible,” as Ioannidis wrote. It may take help from major journals in use and institutions in training, but in order to achieve widespread and improved science, the change may be necessary.

The editorial, “The Proposal to Lower P Value Thresholds to .005,” was published in JAMA.