Replication value as a function of citation impact and sample size

Peder Isager; Anna van 't Veer; Daniël Lakens

doi:10.15626/MP.2022.3300

Downloads

PDF

Authors

Peder Isager Oslo New University College https://orcid.org/0000-0002-6922-3590
Anna van 't Veer Leiden University https://orcid.org/0000-0002-2733-1841
Daniël Lakens Eindhoven University of Technology https://orcid.org/0000-0002-0247-239X

DOI:

https://doi.org/10.15626/MP.2022.3300

Keywords:

replication value, replication, study selection, expected utility, RVcn

Abstract

Researchers seeking to replicate original research often need to decide which of several relevant candidates to select for replication. Several strategies for study selection have been proposed, utilizing a variety of observed indicators as criteria for selection. However, few strategies clearly specify the goal of study selection and how that goal is related to the indicators that are utilized. We have previously formalized a decision model of replication study selection in which the goal of study selection is to maximize the expected utility gain of the replication effort. We further define the concept of replication value as a proxy for expected utility gain (Isager et al., 2023). In this article, we propose a quantitative operationalization of replication value. We first discuss how value and uncertainty - the two concepts used to determine replication value – could be estimated via information about citation count and sample size. Second, we propose an equation for combining these indicators into an overall estimate of replication value, which we denote RV_Cn. Third, we suggest how RV_Cn could be implemented as part of a broader study selection procedure. Finally, we provide preliminary data suggesting that studies that were in fact selected for replication tend to have relatively high RV_Cn estimates. The goal of this article is to explain how RV_Cn is intended to work and, in doing so, demonstrate the many assumptions that should be explicit in any replication study selection strategy.

Metrics

Metrics Loading ...

Author Biographies

Anna van 't Veer, Leiden University

Assistant professor,

Faculteit der Sociale Wetenschappen, Instituut Psychologie, Methodologie & Statistiek

Daniël Lakens, Eindhoven University of Technology

Associate professor,

Dept. Industrial Engineering and Innovation Sciences

References

Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. SAGE Open, 9(1), 215824401982957. https://doi.org/10.1177/2158244019829575

Altmejd, A., Dreber, A., Forsell, E., Huber, J., Imai, T., Johannesson, M., & Camerer, C. (2019). Predicting the replicability of social science lab experiments. PLOS ONE, 14(12), e0225826. https://doi.org/10.1371/journal.pone.0225826

APA Dictionary of Psychology. (n.d.-a). Reliability [Accessed: 2025-06-20]. https : / / dictionary. apa .org/reliability

APA Dictionary of Psychology. (n.d.-b). Test bias [Accessed: 2025-06-20]. https : / / dictionary. apa .org/test-bias

Blaszczynski, A., & Gainsbury, S. M. (2019). Editor’s note: Replication crisis in the social sciences. International Gambling Studies, 19(3), 359–361. https : / / doi . org / 10 . 1080 / 14459795 . 2019 .1673786

Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? a review of studies on citing behavior. Journal of Documentation, 64(1), 45–80. https : / / doi . org / 10 . 1108 /00220410810844150

Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222. https://doi.org/10.1002/asi.23329

Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061

Burgers, J. (2019). Citation counts as a measure for scientific impact. https://doi.org/10.31237/osf.io/y9r7c

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475

Chakraborty, T., Kumar, S., Goyal, P., Ganguly, N., & Mukherjee, A. (2014). Towards a stratified learning approach to predict future citation counts. IEEE/ACM Joint Conference on Digital Libraries, 351–360. https://doi.org/10.1109/JCDL.2014.6970190

Chamberlain, S., Zhu, H., Jahn, N., Boettiger, C., & Ram, K. (2020). Rcrossref. DeBruine, L. M., & Barr, D. J. (2021). Understanding

mixed-effects models through data simulation. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920965119. https://doi.org/10.1177/2515245920965119

DeDeo, S. (2018). Information theory for intelligent people [p. 15].

Field, S. M., Hoekstra, R., Bringmann, L., & Van Raven-zwaaij, D. (2019). When and why to replicate: As easy as 1, 2, 3? Collabra: Psychology, 5(1), 46. https://doi.org/10.1525/collabra.218

Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., & Zwienenberg, M. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573. https://doi.org/10.1177/1745691616652873

Heirene, R. M. (2021). A call for replications of addiction research: Which studies should we replicate and what constitutes a "successful" replication? Addiction Research Theory, 29(2), 89–97. https://doi.org/10.1080/16066359.2020.1751130

Hernán, M., & Robins, J. (2020). Causal inference: What if. Chapman Hall/CRC. Ioannidis, J. P. A., Boyack, K. W., Small, H., Sorensen, A. A., & Klavans, R. (2014). Bibliometrics: Is your most cited work your best? Nature News, 514(7524), 561. https:/ /doi.org/10.1038/514561a

Isager, P. M. (2018). What to replicate? justifications of study choice from 85 replication studies. https://doi.org/10.5281/zenodo.1286715

Isager, P. M. (2020). Test validity defined as d-connection between target and measured at-tribute: Expanding the causal definition of bors-boom et al. (2004) [PsyArXiv]. https://doi.org/10.31234/osf.io/btgsr

Isager, P. M., van Aert, R. C. M., Bahník, Š., Brandt, M. J., DeSoto, K. A., Giner-Sorolla, R., Krueger, J. I., Perugini, M., Ropovik, I., van ’t Veer, A. E., Vranka, M., & Lakens, D. (2023). Deciding what to replicate: A decision model for replication study selection under resource and knowledge constraints. Psychological Methods, 28(2), 438–451. https://doi.org/10.1037/met0000438

Ke, Q., Ferrara, E., Radicchi, F., & Flammini, A. (2015). Defining and identifying sleeping beauties in science. Proceedings of the National Academy of Sciences, 112(24), 7426–7431. https://doi.org/10.1073/pnas.1424329112

Klautzer, L., Hanney, S., Nason, E., Rubin, J., Grant, J., & Wooding, S. (2011). Assessing policy and practice impacts of social science research: The application of the payback framework to assess the future of work programme. Research Evaluation, 20(3), 201–209. https://doi.org/10.3152/095820211X13118583635675

KNAW. (2018). Replication studies – improving reproducibility in the empirical sciences.

Lakens, D., & DeBruine, L. M. (2021). Improving transparency, falsifiability, and rigor by making hypothesis tests machine-readable. Advances in Methods and Practices in Psychological Science, 4(2), 2515245920970949. https://doi.org/10.1177/2515245920970949

Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in psychology research: How often do they really occur?

Perspectives on Psychological Science, 7(6), 537–542. https://doi.org/10.1177/1745691612460688

Martín-Martín, A., Orduna-Malea, E., Thelwall, M., & Delgado López-Cózar, E. (2018). Google scholar, web of science, and scopus: A systematic comparison of citations in 252 subject categories. Journal of Informetrics, 12(4), 1160–1177. https://doi.org/10.1016/j.joi.2018.09.002

Matiasz, N. J., Wood, J., Doshi, P., Speier, W., Beckemeyer, B., Wang, W., & Silva, A. J. (2018). Researchmaps.org for integrating and planning research. PloS One, 13(5), e0195271. https://doi.org/10.1371/journal.pone.0195271

Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.

Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests?

Mueller-Langer, F., Fecher, B., Harhoff, D., & Wagner, G. G. (2019). Replication studies in economics—how many and which papers are chosen for replication, and why? Research Policy, 48(1), 62–83. https://doi.org/10.1016/j.respol.2018.07.019

Murphy, J., Mesquida, C., Caldwell, A. R., Earp, B. D., & Warne, J. (2021). Selection protocol for replication in sports and exercise science [OSF Preprints]. https://doi.org/10.31219/osf.io/v3wz4

Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., Rodrigues, N. P., & Rife, S. C. (2021). Scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. bioRxiv. https://doi.org/10.1101/2021.03.15.435418

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716-aac4716. https://doi.org/10.1126/science.aac4716

Parolo, P. D. B., Pan, R. K., Ghosh, R., Huberman, B. A., Kaski, K., & Fortunato, S. (2015). Attention decay in science. Journal of Informetrics, 9(4), 734–745. https://doi.org/10.1016/j.joi.2015.07.006

Pittelkow, M.-M., Hoekstra, R., Karsten, J., & van Ravenzwaaij, D. (2020). Replication crisis in clinical psychology: A bayesian and qualitative re-evaluation [PsyArXiv]. https://doi.org/10.31234/osf.io/unezq

Plucker, J. A., & Makel, M. C. (2021). Replication is important for educational psychology: Recent developments and key issues. Educational Psychologist, 0(0), 1–11. https://doi.org/10.1080/00461520.2021.1895796

Radicchi, F., Weissman, A., & Bollen, J. (2017). Quantifying perceived impact of scientific publications. Journal of Informetrics, 11(3), 704–712. https://doi.org/10.1016/j.joi.2017.05.010

Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., & Weber, R. A. (2015). Assessing the robustness of power posing: No effect on hormones and risk tolerance in a large sample of men and women. Psychological Science, 26(5), 653–656. https://doi.org/10.1177/0956797614553946

Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three unsuccessful attempts to replicate bem’s "retroactive facilitation of re-call" effect. PLoS ONE, 7(3), e33423. https://doi.org/10.1371/journal.pone.0033423

Rouder, J. N., & Haaf, J. M. (2018). Power, dominance, and constraint: A note on the appeal of different design traditions.

Advances in Methods and Practices in Psychological Science, 1(1), 19–26. https://doi.org/10.1177/2515245917745058

Sale, C., & Mellor, D. (2018). A call for replication studies in nutrition and health. Nutrition and Health, 24(4), 201–201. https://doi.org/10.1177/0260106018817675

Serra-Garcia, M., & Gneezy, U. (2021). Nonreplicable publications are cited more than replicable ones. Science Advances, 7(21), eabd1705. https://doi.org/10.1126/sciadv.abd1705

van Eck, N. J., Waltman, L., van Raan, A. F. J., Klautz, R. J. M., & Peul, W. C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLOS ONE, 8(4), e62395. https://doi.org/10.1371/journal.pone.0062395

Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the cl common language effect size statistics of Mcgraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132. https://doi.org/10.3102/10769986025002101

Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., & Zwaan, R. A. (2016). Registered replication report: Strack, Martin, Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458

Waltman, L., & van Eck, N. J. (2013). A systematic empirical comparison of different approaches for normalizing citation impact indicators. Journal of Informetrics, 7(4), 833–849. https://doi.org/10.1016/j.joi.2013.08.002

Waltman, L., & van Eck, N. J. (2019). Field normalization of scientometric indicators. In W. Glänzel, H. F. Moed, U. Schmoch, & M. Thelwall (Eds.), Springer handbook of science and technology indicators (pp. 281–300). Springer International Publishing. https://doi.org/10.1007/978- 3-030-02511-3_11

Wang, D., Song, C., & Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132. https://doi. org/10.1126/science.1237825

Wang, M., Ren, J., Li, S., & Chen, G. (2019). Quantifying a paper’s academic impact by distinguishing the unequal intensities and contributions of citations. IEEE Access, 7, 96198–96214. https://doi.org/10.1109/ACCESS.2019.2927016

Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014

Yang, Y., Wu, Y., & Uzzi, B. (2020). Estimating the deep replicability of scientific findings using human and artificial intelligence. Proceedings of the National Academy of Sciences, 117(20), 10762–10768.

Yuan, S., Tang, J., Zhang, Y., Wang, Y., & Xiao, T. (2018). Modeling and predicting citation count via recurrent neural network with long short-term memory. https://arxiv.org/abs/1811.02129

Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2018). Making replication mainstream. Behav-ioral and Brain Sciences, 41. https://doi.org/10.1017/S0140525X17001972