Distinguishing Between Models and Hypotheses: Implications for Significance Testing
DOI:
https://doi.org/10.15626/MP.2021.2957Keywords:
null hypothesis significance testing, null hypothesis, test hypothesis, model, statistical model, education, temptationAbstract
In the debate about the merits or demerits of null hypothesis significance testing (NHST), authorities on both sides assume that the p value that a researcher computes is based on the null hypothesis or test hypothesis. If the assumption is true, it suggests that there are proper uses for NHST, such as distinguishing between competing directional hypotheses. And once it is admitted that there are proper uses for NHST, it makes sense to educate substantive researchers about how to use NHST properly and avoid using it improperly. From this perspective, the conclusion would be that researchers in the business and social sciences could benefit from better education pertaining to NHST. In contrast, my goal is to demonstrate that the p value that a researcher computes is not based on a hypothesis, but on a model in which the hypothesis is embedded. In turn, the distinction between hypotheses and models indicates that NHST cannot soundly be used to distinguish between competing directional hypotheses or to draw any conclusions about directional hypotheses whatsoever. Therefore, it is not clear that better education is likely to prove satisfactory. It is the temptation issue, not the education issue, that deserves to be in the forefront of NHST discussions.
Metrics
References
Amrhein, V., Trafimow, D., & Greenland, S. (2019). Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician, 73(sup1), 262–270.
Bem, D. J. (1967). Self-perception: An alternative interpretation of cognitive dissonance phenomena. Psychological Review, 74(3), 183–200.
Berk, R. A., & Freedman, D. A. (2003). Statistical assumptions as empirical commitments. In T. G. Blomberg & S. Cohen (Eds.), Law, punishment, and social control: Essays in honor of Sheldon Messinger (2nd ed., pp. 235–254). Aldine de Gruyter.
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 9(2), 78–84.
Box, G. E. P., & Draper, N. R. (1987). Empirical model-building and response surfaces. John Wiley & Sons.
Bradley, M. T., & Brand, A. (2016). Significance testing needs a taxonomy: Or how the Fisher, Neyman-Pearson controversy resulted in the inferential tail wagging the measurement dog. Psychological Reports, 119(2), 487–504.
Carver, R. P. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education, 61(4), 287–292.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://www.sjsu.edu/faculty/gerstman/misc/Cohen1994.pdf
Einstein, A. (1961). Relativity: The special and the general theory (R. W. Lawson, Trans.). Crown Publishers.
Festinger, L., & Carlsmith, J. M. (1959). Cognitive consequences of forced compliance. Journal of Abnormal and Social Psychology, 58(2), 203–210.
Fisher, R. A. (1973). Statistical methods and scientific inference (3rd ed.). Collier Macmillan.
Gillies, D. (2000). Philosophical theories of probability. Taylor & Francis.
Greenland, S. (2019). Valid p-values behave exactly as they should: Some misleading criticisms of p-values and their resolution with s-values. The American Statistician, 73(sup1), 106–114.
Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, p values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31, 337–350.
Grice, J. W. (2017). Comment on Locascio’s results blind manuscript evaluation proposal. Basic and Applied Social Psychology, 39(5), 254–255.
Hirschauer, N., Grüner, S., Muhoff, O., Becker, C., & Jantsch, A. (2020). Can p-values be meaningfully interpreted without random sampling? Statistics Surveys, 14, 71–91.
Ho, A. D., & Yu, C. C. (2015). Descriptive statistics for modern test score distributions: Skewness, kurtosis, discreteness, and ceiling effects. Educational and Psychological Measurement, 75(3), 365–388.
Hyman, M. (2017). Can ‘results blind manuscript evaluation’ assuage ‘publication bias’? Basic and Applied Social Psychology, 39(5), 247–251.
Kline, R. (2017). Comment on Locascio, results blind science publishing. Basic and Applied Social Psychology, 39(5), 256–257.
Lakens, D. (2021). The practical alternative to the p value is the properly used p value. Perspectives on Psychological Science, 16(3), 639–648.
Lavine, M. (2022). P-values don’t measure evidence. Communications in Statistics - Theory and Methods, 53(2), 718–726.
Locascio, J. (2017a). Rejoinder to responses to “results blind publishing.” Basic and Applied Social Psychology, 39(5), 258–261.
Locascio, J. (2017b). Results blind publishing. Basic and Applied Social Psychology, 39(5), 239–246.
Marks, M. J. (2017). Commentary on Locascio 2017. Basic and Applied Social Psychology, 39(5), 252–253.
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology, 59, 537–563.
McQuitty, S. (2004). Statistical power and structural equation models in business research. Journal of Business Research, 57(2), 175–183.
McQuitty, S. (2018). Reflections on “Statistical power and structural equation models in business research.” Journal of Global Scholars of Marketing Science, 28(3), 272–277.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105(1), 156–166.
Michelson, A. A., & Morley, E. W. (1887). On the relative motion of Earth and luminiferous ether. American Journal of Science, Third Series, 34, 203, 233–245.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
Richters, J. E. (2021). Incredible utility: The lost causes and causal debris of psychological science. Basic and Applied Social Psychology, 43(6), 366–405.
Trafimow, D. (2003). Hypothesis testing and theory evaluation at the boundaries: Surprising insights from Bayes’s theorem. Psychological Review, 110(3), 526–535.
Trafimow, D. (2019a). A frequentist alternative to significance testing, p-values, and confidence intervals. Econometrics, 7(2), 1–14.
Trafimow, D. (2019b). A taxonomy of model assumptions on which p is based and implications for added benefit in the sciences. International Journal of Social Research Methodology, 22(6), 571–583.
Trafimow, D., Hyman, M. R., Kostyk, A., Wang, C., & Wang, T. (2021). The harmful effect of null hypothesis significance testing on marketing research: An example. Journal of Business Research, 125, 39–44.
Trafimow, D., Hyman, M. R., Kostyk, A., Wang, Z., Tong, T., Wang, T., & Wang, C. (2022). Gain-probability diagrams in consumer research. International Journal of Market Research, 64(4), 470–483.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37(1), 1–2.
Trafimow, D., & Rice, S. (2009). What if social scientists had reviewed great scientific works of the past? Perspectives on Psychological Science, 4(1), 65–78.
Trafimow, D., Tong, T., Wang, T., Choy, S. T. B., Hu, L., Chen, X., Wang, C., & Wang, Z. (in press). Improving inferential analyses pre-data and post-data. Psychological Methods.
Valentine, J. C., Aloe, A. M., & Lau, T. S. (2015). Life after NHST: How to describe your data without “p-ing” everywhere. Basic and Applied Social Psychology, 37(5), 260–273.
Vidgen, B., & Yasseri, T. (2016). P-values: Misunderstood and misused. Frontiers in Physics, 4(6).
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA’s statement on p-values: Context, process, and purpose. The American Statistician.
Wasserstein, R. L., Schirm, A. L., & Lazar, N. A. (2019). Editorial: Moving to a world beyond “p < 0.05.” The American Statistician, 73(suppl.), 1–19.
Published
Issue
Section
License
Copyright (c) 2024 David Trafimow
This work is licensed under a Creative Commons Attribution 4.0 International License.