Manipulating the Alpha Level Cannot Cure Significance Testing
When evaluating the strength of the evidence, we should consider auxiliary assumptions, the strength of the experimental design, and implications for applications. To boil all this down to a binary decision based on a p-value threshold is not acceptable.
John Ioannidis discusses the potential effects on clinical research of a 2017 proposal to lower the default P value threshold for statistical significance from .05 to .005 as a means to reduce false-positive findings.
A study has revealed a high prevalence of inconsistencies in reported statistical test results. Such inconsistencies make results unreliable, as they become “irreproducible”, and ultimately affect the level of trust in scientific reporting.
Reproducible research includes sharing data and code. The reproducibility policy at the journal Biostatistics rewards articles with badges for data and code sharing. This study investigates the effect of badges at increasing reproducible research, specifically, data and code sharing, at Biostatistics.
Nearly 100 Scientists Spent 2 Months on Google Docs to Redefine the P-Value
A new paper recommends that the label “statistically significant” be dropped altogether; instead, researchers should describe and justify their decisions about study design and interpretation of the data, including the statistical threshold.
The Distribution of P-values in Medical Research Articles Suggested Selective Reporting Associated with Statistical Significance
Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias.