Replication value as a function of citation impact and sample size
DOI:
https://doi.org/10.15626/MP.2022.3300Keywords:
replication value, replication, study selection, expected utility, RVcnAbstract
Researchers seeking to replicate original research often need to decide which of several relevant candidates to select for replication. Several strategies for study selection have been proposed, utilizing a variety of observed indicators as criteria for selection. However, few strategies clearly specify the goal of study selection and how that goal is related to the indicators that are utilized. We have previously formalized a decision model of replication study selection in which the goal of study selection is to maximize the expected utility gain of the replication effort. We further define the concept of replication value as a proxy for expected utility gain (Isager et al., 2023). In this article, we propose a quantitative operationalization of replication value. We first discuss how value and uncertainty - the two concepts used to determine replication value – could be estimated via information about citation count and sample size. Second, we propose an equation for combining these indicators into an overall estimate of replication value, which we denote RVCn. Third, we suggest how RVCn could be implemented as part of a broader study selection procedure. Finally, we provide preliminary data suggesting that studies that were in fact selected for replication tend to have relatively high RVCn estimates. The goal of this article is to explain how RVCn is intended to work and, in doing so, demonstrate the many assumptions that should be explicit in any replication study selection strategy.
Metrics
References
Aksnes, D. W., Langfeldt, L., & Wouters, P. (2019). Citations, citation indicators, and research quality: An overview of basic concepts and theories. SAGE Open, 9(1), 215824401982957. https://doi.org/10.1177/2158244019829575
Altmejd, A., Dreber, A., Forsell, E., Huber, J., Imai, T., Johannesson, M., & Camerer, C. (2019). Predicting the replicability of social science lab experiments. PLOS ONE, 14(12), e0225826. https://doi.org/10.1371/journal.pone.0225826
APA Dictionary of Psychology. (n.d.-a). Reliability [Accessed: 2025-06-20]. https : / / dictionary. apa .org/reliability
APA Dictionary of Psychology. (n.d.-b). Test bias [Accessed: 2025-06-20]. https : / / dictionary. apa .org/test-bias
Blaszczynski, A., & Gainsbury, S. M. (2019). Editor’s note: Replication crisis in the social sciences. International Gambling Studies, 19(3), 359–361. https : / / doi . org / 10 . 1080 / 14459795 . 2019 .1673786
Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? a review of studies on citing behavior. Journal of Documentation, 64(1), 45–80. https : / / doi . org / 10 . 1108 /00220410810844150
Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222. https://doi.org/10.1002/asi.23329
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061–1071. https://doi.org/10.1037/0033-295X.111.4.1061
Burgers, J. (2019). Citation counts as a measure for scientific impact. https://doi.org/10.31237/osf.io/y9r7c
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475
Chakraborty, T., Kumar, S., Goyal, P., Ganguly, N., & Mukherjee, A. (2014). Towards a stratified learning approach to predict future citation counts. IEEE/ACM Joint Conference on Digital Libraries, 351–360. https://doi.org/10.1109/JCDL.2014.6970190
Chamberlain, S., Zhu, H., Jahn, N., Boettiger, C., & Ram, K. (2020). Rcrossref. DeBruine, L. M., & Barr, D. J. (2021). Understanding
mixed-effects models through data simulation. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920965119. https://doi.org/10.1177/2515245920965119
DeDeo, S. (2018). Information theory for intelligent people [p. 15].
Field, S. M., Hoekstra, R., Bringmann, L., & Van Raven-zwaaij, D. (2019). When and why to replicate: As easy as 1, 2, 3? Collabra: Psychology, 5(1), 46. https://doi.org/10.1525/collabra.218
Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., & Zwienenberg, M. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573. https://doi.org/10.1177/1745691616652873
Heirene, R. M. (2021). A call for replications of addiction research: Which studies should we replicate and what constitutes a "successful" replication? Addiction Research Theory, 29(2), 89–97. https://doi.org/10.1080/16066359.2020.1751130
Hernán, M., & Robins, J. (2020). Causal inference: What if. Chapman Hall/CRC. Ioannidis, J. P. A., Boyack, K. W., Small, H., Sorensen, A. A., & Klavans, R. (2014). Bibliometrics: Is your most cited work your best? Nature News, 514(7524), 561. https:/ /doi.org/10.1038/514561a
Isager, P. M. (2018). What to replicate? justifications of study choice from 85 replication studies. https://doi.org/10.5281/zenodo.1286715
Isager, P. M. (2020). Test validity defined as d-connection between target and measured at-tribute: Expanding the causal definition of bors-boom et al. (2004) [PsyArXiv]. https://doi.org/10.31234/osf.io/btgsr
Isager, P. M., van Aert, R. C. M., Bahník, Š., Brandt, M. J., DeSoto, K. A., Giner-Sorolla, R., Krueger, J. I., Perugini, M., Ropovik, I., van ’t Veer, A. E., Vranka, M., & Lakens, D. (2023). Deciding what to replicate: A decision model for replication study selection under resource and knowledge constraints. Psychological Methods, 28(2), 438–451. https://doi.org/10.1037/met0000438
Ke, Q., Ferrara, E., Radicchi, F., & Flammini, A. (2015). Defining and identifying sleeping beauties in science. Proceedings of the National Academy of Sciences, 112(24), 7426–7431. https://doi.org/10.1073/pnas.1424329112
Klautzer, L., Hanney, S., Nason, E., Rubin, J., Grant, J., & Wooding, S. (2011). Assessing policy and practice impacts of social science research: The application of the payback framework to assess the future of work programme. Research Evaluation, 20(3), 201–209. https://doi.org/10.3152/095820211X13118583635675
KNAW. (2018). Replication studies – improving reproducibility in the empirical sciences.
Lakens, D., & DeBruine, L. M. (2021). Improving transparency, falsifiability, and rigor by making hypothesis tests machine-readable. Advances in Methods and Practices in Psychological Science, 4(2), 2515245920970949. https://doi.org/10.1177/2515245920970949
Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in psychology research: How often do they really occur?
Perspectives on Psychological Science, 7(6), 537–542. https://doi.org/10.1177/1745691612460688
Martín-Martín, A., Orduna-Malea, E., Thelwall, M., & Delgado López-Cózar, E. (2018). Google scholar, web of science, and scopus: A systematic comparison of citations in 252 subject categories. Journal of Informetrics, 12(4), 1160–1177. https://doi.org/10.1016/j.joi.2018.09.002
Matiasz, N. J., Wood, J., Doshi, P., Speier, W., Beckemeyer, B., Wang, W., & Silva, A. J. (2018). Researchmaps.org for integrating and planning research. PloS One, 13(5), e0195271. https://doi.org/10.1371/journal.pone.0195271
Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge University Press.
Meehl, P. E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests?
Mueller-Langer, F., Fecher, B., Harhoff, D., & Wagner, G. G. (2019). Replication studies in economics—how many and which papers are chosen for replication, and why? Research Policy, 48(1), 62–83. https://doi.org/10.1016/j.respol.2018.07.019
Murphy, J., Mesquida, C., Caldwell, A. R., Earp, B. D., & Warne, J. (2021). Selection protocol for replication in sports and exercise science [OSF Preprints]. https://doi.org/10.31219/osf.io/v3wz4
Nicholson, J. M., Mordaunt, M., Lopez, P., Uppala, A., Rosati, D., Rodrigues, N. P., & Rife, S. C. (2021). Scite: A smart citation index that displays the context of citations and classifies their intent using deep learning. bioRxiv. https://doi.org/10.1101/2021.03.15.435418
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716-aac4716. https://doi.org/10.1126/science.aac4716
Parolo, P. D. B., Pan, R. K., Ghosh, R., Huberman, B. A., Kaski, K., & Fortunato, S. (2015). Attention decay in science. Journal of Informetrics, 9(4), 734–745. https://doi.org/10.1016/j.joi.2015.07.006
Pittelkow, M.-M., Hoekstra, R., Karsten, J., & van Ravenzwaaij, D. (2020). Replication crisis in clinical psychology: A bayesian and qualitative re-evaluation [PsyArXiv]. https://doi.org/10.31234/osf.io/unezq
Plucker, J. A., & Makel, M. C. (2021). Replication is important for educational psychology: Recent developments and key issues. Educational Psychologist, 0(0), 1–11. https://doi.org/10.1080/00461520.2021.1895796
Radicchi, F., Weissman, A., & Bollen, J. (2017). Quantifying perceived impact of scientific publications. Journal of Informetrics, 11(3), 704–712. https://doi.org/10.1016/j.joi.2017.05.010
Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., & Weber, R. A. (2015). Assessing the robustness of power posing: No effect on hormones and risk tolerance in a large sample of men and women. Psychological Science, 26(5), 653–656. https://doi.org/10.1177/0956797614553946
Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three unsuccessful attempts to replicate bem’s "retroactive facilitation of re-call" effect. PLoS ONE, 7(3), e33423. https://doi.org/10.1371/journal.pone.0033423
Rouder, J. N., & Haaf, J. M. (2018). Power, dominance, and constraint: A note on the appeal of different design traditions.
Advances in Methods and Practices in Psychological Science, 1(1), 19–26. https://doi.org/10.1177/2515245917745058
Sale, C., & Mellor, D. (2018). A call for replication studies in nutrition and health. Nutrition and Health, 24(4), 201–201. https://doi.org/10.1177/0260106018817675
Serra-Garcia, M., & Gneezy, U. (2021). Nonreplicable publications are cited more than replicable ones. Science Advances, 7(21), eabd1705. https://doi.org/10.1126/sciadv.abd1705
van Eck, N. J., Waltman, L., van Raan, A. F. J., Klautz, R. J. M., & Peul, W. C. (2013). Citation analysis may severely underestimate the impact of clinical research as compared to basic research. PLOS ONE, 8(4), e62395. https://doi.org/10.1371/journal.pone.0062395
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the cl common language effect size statistics of Mcgraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132. https://doi.org/10.3102/10769986025002101
Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., & Zwaan, R. A. (2016). Registered replication report: Strack, Martin, Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. https://doi.org/10.1177/1745691616674458
Waltman, L., & van Eck, N. J. (2013). A systematic empirical comparison of different approaches for normalizing citation impact indicators. Journal of Informetrics, 7(4), 833–849. https://doi.org/10.1016/j.joi.2013.08.002
Waltman, L., & van Eck, N. J. (2019). Field normalization of scientometric indicators. In W. Glänzel, H. F. Moed, U. Schmoch, & M. Thelwall (Eds.), Springer handbook of science and technology indicators (pp. 281–300). Springer International Publishing. https://doi.org/10.1007/978- 3-030-02511-3_11
Wang, D., Song, C., & Barabási, A.-L. (2013). Quantifying long-term scientific impact. Science, 342(6154), 127–132. https://doi. org/10.1126/science.1237825
Wang, M., Ren, J., Li, S., & Chen, G. (2019). Quantifying a paper’s academic impact by distinguishing the unequal intensities and contributions of citations. IEEE Access, 7, 96198–96214. https://doi.org/10.1109/ACCESS.2019.2927016
Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143(5), 2020–2045. https://doi.org/10.1037/xge0000014
Yang, Y., Wu, Y., & Uzzi, B. (2020). Estimating the deep replicability of scientific findings using human and artificial intelligence. Proceedings of the National Academy of Sciences, 117(20), 10762–10768.
Yuan, S., Tang, J., Zhang, Y., Wang, Y., & Xiao, T. (2018). Modeling and predicting citation count via recurrent neural network with long short-term memory. https://arxiv.org/abs/1811.02129
Zwaan, R. A., Etz, A., Lucas, R. E., & Donnellan, M. B. (2018). Making replication mainstream. Behav-ioral and Brain Sciences, 41. https://doi.org/10.1017/S0140525X17001972
Published
Issue
Section
License
Copyright (c) 2025 Peder Isager, Anna van 't Veer, Daniël Lakens

This work is licensed under a Creative Commons Attribution 4.0 International License.