•  
  •  
 

Abstract

Null-hypothesis significance testing and p-values are frequently criticized for their focus on detecting non-zero differences and their inability to provide evidence for the null hypothesis. In this article, we highlight how effect sizes, when meaningfully interpreted, can address these issues. Specifically, we argue that researchers should consider the smallest effect size of interest (SESOI) - the smallest effect size that yields practical or theoretical relevance. We propose several methods for estimating the SESOI and present a consensus study among Indonesian professionals which can be used to estimate the SESOI for child eyewitness testimony research. Results suggest that most Indonesian professionals consider one to two memory errors sufficient to take action, such as deeming testimony unreliable. We then showed how the SESOI, combined with confidence intervals, can be used data and power analyses (e.g., minimum-effect testing, equivalence testing). Finally, we emphasize that the practical relevance of an effect size should be carefully evaluated before making policy recommendations.

References

Alter, U., & Counsell, A. (2023) Determining negligible associations in regression, The Quantitative Methods for Psychology, 19(1), 59-83. https://doi.org/10.20982/tqmp.19.1.p059

Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567(7748), 305-307. https://doi.org/10.1038/d41586-019-00857-9

Anvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A. K., Tiokhin, L., ... & Orben, A. (2023). Not all effects are indispensable: Psychological science requires verifiable lines of reasoning for whether an effect matters. Perspectives on Psychological Science, 18(2), 503-507. https://doi.org/10.1177/17456916221091565

Baguley, T. (2009). Standardized or simple effect size: What should be reported?. British Journal Of Psychology, 100(3), 603-617. https://doi.org/10.1348/000712608X377117

Bakker, M., Van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543-554. https://doi.org/10.1177/1745691612459060

Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. https://doi.org/10.18637/jss.v067.i01

Beribisky, N., Davidson, H., & Cribbie, R. A. (2019). Exploring perceptions of meaningfulness in visual representations of bivariate relationships. PeerJ, 7, e6853. https://doi.org/10.7717/peerj.6853

Bonini, M., Di Paolo, M., Bagnasco, D., Baiardini, I., Braido, F., Caminati, M., ... & Canonica, G. W. (2020). Minimal clinically important difference for asthma endpoints: an expert consensus report. European Respiratory Review, 29. https://doi.org/10.1183/16000617.0137-2019

Burriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J., Stevens, M., Quigley, R., ... & Rowland, H. M. (2015). Changes in women’s facial skin color over the ovulatory cycle are not detectable by the human visual system. PLoS One, 10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093

Byrne, M. (2019). Increasing the impact of behavior change intervention research: Is there a role for stakeholder engagement?. Health Psychology, 38(4), 290-296. https://doi.org/10.1037/hea0000723

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997

Correll, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020). Avoid Cohen’s ‘small’,‘medium’, and ‘large’ for power analysis. Trends in Cognitive Sciences, 24, 200-207. https://doi.org/10.1016/j.tics.2019.12.009

Cribbie, R. A., Gruman, J. A., & Arpin‐Cribbie, C. A. (2004). Recommendations for applying tests of equivalence. Journal of Clinical Psychology, 60(1), 1-10. https://doi.org/10.1002/jclp.10217

Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7-29. https://doi.org/10.1177/0956797613504966

Dowle, M., Srinivasan, A. (2023). Data.table: Extension of `data.frame`. R package version 1.14.8, https://CRAN.R-project.org/package=data.table

Farmus, L., Beribisky, N., Martinez Gutierrez, N., Alter, U., Panzarella, E., & Cribbie, R. A. (2023). Effect size reporting and interpretation in social personality research. Current Psychology, 42(18), 15752-15762. https://doi.org/10.1007/s12144-021-02621-7

Fife D (2023). flexplot: Graphically Based Data Analysis Using 'flexplot'. R package version 0.19.1.

Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156-168. https://doi.org/10.1177/2515245919847202

Garretson, M. E., & Peck, R. C. (1979). The effects of "no action" negligent operator hearings as an alternative to hearings resulting in probation (Report No. CAL-DMV-RSS-79-69). California Department of Motor Vehicles. https://www.ojp.gov/pdffiles1/Digitization/74470NCJRS.pdf

Greenland, S., Maclure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (1991). Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology, 2, 387-392. https://www.jstor.org/stable/20065707

Greenland, S., Schlesselman, J. J., & Criqui, M. H. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123, 203–208. https://doi.org/10.1093/oxfordjournals.aje.a114229

Gruijters, S. L., & Peters, G. J. Y. (2022). Meaningful change definitions: Sample size planning for experimental intervention research. Psychology & Health, 37(1), 1-16. https://doi.org/10.1080/08870446.2020.1841762

Howe, M. L., & Knott, L. M. (2015). The fallibility of memory in judicial processes: Lessons from the past and their modern consequences. Memory, 23(5), 633-656. https://doi.org/10.1080/09658211.2015.1010709

Isager, P. M., & Fitzgerald, J. (2024). Three-sided testing to establish practical significance: A tutorial. PsyArXiv. https://doi.org/10.31234/osf.io/8y925

Jané, M. B., Xiao, Q., Yeung, S. K., Ben-Shachar, M. S., Caldwell, A. R., Cousineau, D., … Feldman, G. (2024b). Guide to effect sizes and confidence intervals. https://doi.org/10.17605/OSF.IO/D8C4G

Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A., Argamon, S. E., ... & Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168-171. https://doi.org/10.1038/s41562-018-0311-x

Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1, 259-269. https://doi.org/10.1177/2515245918770963

Lemay, K. R., Tulloch, H. E., Pipe, A. L., & Reed, J. L. (2019). Establishing the minimal clinically important difference for the hospital anxiety and depression scale in patients with cardiovascular disease. Journal of Cardiopulmonary Rehabilitation and Prevention, 39, E6-E11. https://doi.org/10.1097/HCR.0000000000000379

Lenth R (2023). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.8.9, https://CRAN.R-project.org/package=emmeans

Loftus, E. F. (2005). Planting misinformation in the human mind: A 30-year investigation of the malleability of memory. Learning & Memory, 12, 361-366. https://doi.org/10.1101/lm.94705

Lüdecke, D., Ben-Shachar, M., Patil, I., Waggoner, P., & Makowski, D. (2021). Performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139

Lydick, E., & Epstein, R. S. (1993). Interpretation of quality of life changes. Quality of life Research, 2(3), 221-226. https://doi.org/10.1007/BF00435226

Makowski, D., Ben-Shachar, M., & Lüdecke, D. (2019). bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework. Journal of Open Source Software, 4(40), 1541. http://doi.org/10.21105/joss.01541

McGlothlin, A. E., & Lewis, R. J. (2014). Minimal clinically important difference: defining what really matters to patients. Jama, 312(13), 1342-1343. https://doi.org/10.1001/jama.2014.13128

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103-115. https://doi.org/10.1086/288135

Metzler, C. M. (1974). Bioavailability - a problem in equivalence. Biometrics, 30(2), 309–317. https://doi.org/10.2307/2529651

Morris, P. E., & Fritz, C. O. (2013). Effect sizes in memory research. Memory, 21(7), 832-842. https://doi.org/10.1080/09658211.2013.763984

Murphy, S. L., Merz, R., Reimann, L., & Fernández, A. (2024, November 13). Nonsignificance misinterpreted as an effect’s absence in psychology: Prevalence and temporal analyses. PsyArXiv. https://doi.org/10.31234/osf.io/hm2tu

Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology, 84, 234-248. https://doi.org/10.1037/0021-9010.84.2.234

National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303.

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. https://doi.org/10.1146/annurev-psych-020821-114157

Otgaar, H., Riesthuis, P., Neal, T. M. S., Chin, J., Boskovic, I., & Rassin, E. (2023). If generalization is the grail, practical relevance is the nirvana: Considerations from the contribution of psychological science of memory to law. Journal of Applied Research in Memory and Cognition, 12, 176–179. https://doi.org/10.1037/mac0000116

Otgaar, H., Howe, M. L., & Dodier, O. (2022a). What can expert witnesses reliably say about memory in the courtroom?. Forensic Science International: Mind And Law, 3, 100106. https://doi.org/10.1016/j.fsiml.2022.100106

Otgaar, H., Riesthuis, P., Ramaekers, J. G., Garry, M., & Kloft, L. (2022b). The importance of the smallest effect size of interest in expert witness testimony on alcohol and memory. Frontiers in Psychology, 13, 980533. https://doi.org/10.3389/fpsyg.2022.980533

Panzarella, E., Beribisky, N., & Cribbie, R. A. (2021). Denouncing the use of field-specific effect size distributions to inform magnitude. PeerJ, 9, e11383. https://doi.org/10.7717/peerj.11383

Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological methods, 23(2), 208-225. https://doi.org/10.1037/met0000126

Perugini, A., Gambarota, F., Toffalini, E., Lakens, D., Pastore, M., Finos, L., ... & Altoè, G. (2025). The benefits of reporting critical-effect-size values. Advances in Methods and Practices in Psychological Science, 8(2), 25152459251335298. https://doi.org/10.1177/25152459251335298 h

R Core Team (2023). R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Revelle, W. (2023). Psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.12, https://CRAN.R-project.org/package=psych

Riesthuis, P., & Cribbie, R. (2025). When are scientific findings deemed practically or theoretically relevant? A literature review on minimum-effect testing. The Quantitative Methods for Psychology, 21(2), 82-94. https://doi.org/10.20982/tqmp.21.2.p082

Riesthuis, P., Howe, M. L., & Otgaar, H. (2025c). Meaningful approaches to assessing the size of effects in memory research: applications and recommendations for study design, interpretation, and analysis. Memory, 33(5), 485–494. https://doi.org/10.1080/09658211.2025.2492496

Riesthuis, P., Mangiulli, I., Broers, N., & Otgaar, H. (2022). Expert opinions on the smallest effect size of interest in false memory research. Applied Cognitive Psychology, 36, 203-215. https://doi.org/10.1002/acp.3911

Riesthuis, P., & Otgaar, H. (2024a). An overview of the replicability, generalizability and practical relevance of eyewitness testimony research in the Journal of Criminal Psychology. Journal of Criminal Psychology, 15(2), 176-194. https://doi.org/10.1108/JCP-04-2024-0031

Riesthuis, P., & Otgaar, H. (2024). On the use of receiver operating characteristic area under the curve in eyewitness memory research. Legal and Criminological Psychology, 00, 1–19. https://doi.org/10.1111/lcrp.12300

Riesthuis, P., Otgaar, H., & Bücken, C. (2025a). Simulation-based power analyses for the smallest effect size of interest: A confidence-interval approach for minimum-effect and equivalence testing. Behavior Research Methods, 57(4), 1-20. https://doi.org/10.31234/osf.io/sq7m3

Riesthuis, P., Rassin, E., Bücken, C., Booker, A., Chin, J., Goldfarb, D., Deferme, D., & Otgaar, H. (2025b). Through the lens of legal professionals: Examining the smallest effect size of interest for eyewitness memory research. The International Journal of Evidence & Proof, 13657127251357630. https://doi.org/10.1177/13657127251357630

Riesthuis, P. (2024). Simulation-based power analyses for the smallest effect size of interest: A confidence-interval approach for minimum-effect and equivalence testing. Advances in Methods and Practices in Psychological Science, 7(2). https://doi.org/10.1177/25152459241240722

Riesthuis, P. (2025). Assessing power analysis practices in eyewitness memory research: A review and simulation study. Journal of Applied Research in Memory and Cognition. Advance online publication. https://doi.org/10.1037/mac0000237

Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3(4), 403–411. https://doi.org/10.1037/1082-989X.3.4.403

Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.

Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123-1128. https://doi.org/10.1177/1745691617708630

Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J.A., & Larmarange, J. (2021). Reproducible summary tables with the gtsummary package. The R Journal,13, 570–80. https://doi.org/10.32614/RJ-2021-053

Smiley, A. H., Glazier, J. J., & Shoda, Y. (2023). Null regions: a unified conceptual framework for statistical inference. Royal Society Open Science, 10(11), 221328. https://doi.org/10.1098/rsos.221328

Staniszewska, S., Haywood, K. L., Brett, J., & Tutton, L. (2012). Patient and public involvement in patient-reported outcome measures: evolution not revolution. The Patient-Patient-Centered Outcomes Research, 5, 79-87. https://doi.org/10.2165/11597150-000000000-00000

Thériault, R. (2023). rempsyc: Convenience functions for psychology. Journal of Open Source Software, 8(87), 5466. https://doi.org/10.21105/joss

Thompson-Cannino, J., Cotton, R., & Torneo, E. (2009). Picking cotton: Our memoir of injustice and redemption. Macmillan.

Tullett, A. M. (2022). The limitations of social science as the arbiter of blame: An argument for abandoning retribution. Perspectives on Psychological Science, 17(4), 995-1007. https://doi.org/10.1177/17456916211033284

van der Heijde, D., Lassere, M., Edmonds, J., Kirwan, J., Strand, V., & Boers, M. (2001). Minimal clinically important difference in plain films in RA: group discussions, conclusions, and recommendations. OMERACT Imaging Task Force. The Journal of Rheumatology, 28, 914-917.

Westlake, W. J. (1972). Use of confidence intervals in analysis of comparative bioavailability trials. Journal of Pharmaceutical Sciences, 61(8), 1340-1341. https://doi.org/10.1002/jps.2600610845

Wie, T., & Simko, V. (2021). R package 'corrplot': Visualization of a Correlation Matrix (Version 0.92). Available from https://github.com/taiyun/corrplot

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D. A., François, R., ... & Yutani, H. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H., Bryan, J. (2023). readxl: Read Excel Files. R package version 1.4.3, https://CRAN.R-project.org/package=readxl

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Wiedemann, M., Thew, G. R., Košir, U., & Ehlers, A. (2022). lcsm: An R package and tutorial on latent change score modelling. Wellcome Open Research, 7:149. https://doi.org/10.12688/wellcomeopenres.17536.1

Wilke C (2022). ggridges: Ridgeline Plots in 'ggplot2'. R package version 0.5.4. https://CRAN.R-project.org/package=ggridges

Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.