Abstract
Null-hypothesis significance testing and p-values are frequently criticized for their focus on detecting non-zero differences and their inability to provide evidence for the null hypothesis. In this article, we highlight how effect sizes, when meaningfully interpreted, can address these issues. Specifically, we argue that researchers should consider the smallest effect size of interest (SESOI) - the smallest effect size that yields practical or theoretical relevance. We propose several methods for estimating the SESOI and present a consensus study among Indonesian professionals which can be used to estimate the SESOI for child eyewitness testimony research. Results suggest that most Indonesian professionals consider one to two memory errors sufficient to take action, such as deeming testimony unreliable. We then showed how the SESOI, combined with confidence intervals, can be used data and power analyses (e.g., minimum-effect testing, equivalence testing). Finally, we emphasize that the practical relevance of an effect size should be carefully evaluated before making policy recommendations.
References
Alter, U., & Counsell, A. (2023) Determining negligible associations in regression, The Quantitative Methods for Psychology, 19(1), 59-83. https://doi.org/10.20982/tqmp.19.1.p059
Amrhein, V., Greenland, S., & McShane, B. (2019). Scientists rise up against statistical significance. Nature, 567(7748), 305-307. https://doi.org/10.1038/d41586-019-00857-9
Anvari, F., Kievit, R., Lakens, D., Pennington, C. R., Przybylski, A. K., Tiokhin, L., ... & Orben, A. (2023). Not all effects are indispensable: Psychological science requires verifiable lines of reasoning for whether an effect matters. Perspectives on Psychological Science, 18(2), 503-507. https://doi.org/10.1177/17456916221091565
Baguley, T. (2009). Standardized or simple effect size: What should be reported?. British Journal Of Psychology, 100(3), 603-617. https://doi.org/10.1348/000712608X377117
Bakker, M., Van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7(6), 543-554. https://doi.org/10.1177/1745691612459060
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48. https://doi.org/10.18637/jss.v067.i01
Beribisky, N., Davidson, H., & Cribbie, R. A. (2019). Exploring perceptions of meaningfulness in visual representations of bivariate relationships. PeerJ, 7, e6853. https://doi.org/10.7717/peerj.6853
Bonini, M., Di Paolo, M., Bagnasco, D., Baiardini, I., Braido, F., Caminati, M., ... & Canonica, G. W. (2020). Minimal clinically important difference for asthma endpoints: an expert consensus report. European Respiratory Review, 29. https://doi.org/10.1183/16000617.0137-2019
Burriss, R. P., Troscianko, J., Lovell, P. G., Fulford, A. J., Stevens, M., Quigley, R., ... & Rowland, H. M. (2015). Changes in women’s facial skin color over the ovulatory cycle are not detectable by the human visual system. PLoS One, 10(7), e0130093. https://doi.org/10.1371/journal.pone.0130093
Byrne, M. (2019). Increasing the impact of behavior change intervention research: Is there a role for stakeholder engagement?. Health Psychology, 38(4), 290-296. https://doi.org/10.1037/hea0000723
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Correll, J., Mellinger, C., McClelland, G. H., & Judd, C. M. (2020). Avoid Cohen’s ‘small’,‘medium’, and ‘large’ for power analysis. Trends in Cognitive Sciences, 24, 200-207. https://doi.org/10.1016/j.tics.2019.12.009
Cribbie, R. A., Gruman, J. A., & Arpin‐Cribbie, C. A. (2004). Recommendations for applying tests of equivalence. Journal of Clinical Psychology, 60(1), 1-10. https://doi.org/10.1002/jclp.10217
Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7-29. https://doi.org/10.1177/0956797613504966
Dowle, M., Srinivasan, A. (2023). Data.table: Extension of `data.frame`. R package version 1.14.8, https://CRAN.R-project.org/package=data.table
Farmus, L., Beribisky, N., Martinez Gutierrez, N., Alter, U., Panzarella, E., & Cribbie, R. A. (2023). Effect size reporting and interpretation in social personality research. Current Psychology, 42(18), 15752-15762. https://doi.org/10.1007/s12144-021-02621-7
Fife D (2023). flexplot: Graphically Based Data Analysis Using 'flexplot'. R package version 0.19.1.
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156-168. https://doi.org/10.1177/2515245919847202
Garretson, M. E., & Peck, R. C. (1979). The effects of "no action" negligent operator hearings as an alternative to hearings resulting in probation (Report No. CAL-DMV-RSS-79-69). California Department of Motor Vehicles. https://www.ojp.gov/pdffiles1/Digitization/74470NCJRS.pdf
Greenland, S., Maclure, M., Schlesselman, J. J., Poole, C., & Morgenstern, H. (1991). Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology, 2, 387-392. https://www.jstor.org/stable/20065707
Greenland, S., Schlesselman, J. J., & Criqui, M. H. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123, 203–208. https://doi.org/10.1093/oxfordjournals.aje.a114229
Gruijters, S. L., & Peters, G. J. Y. (2022). Meaningful change definitions: Sample size planning for experimental intervention research. Psychology & Health, 37(1), 1-16. https://doi.org/10.1080/08870446.2020.1841762
Howe, M. L., & Knott, L. M. (2015). The fallibility of memory in judicial processes: Lessons from the past and their modern consequences. Memory, 23(5), 633-656. https://doi.org/10.1080/09658211.2015.1010709
Isager, P. M., & Fitzgerald, J. (2024). Three-sided testing to establish practical significance: A tutorial. PsyArXiv. https://doi.org/10.31234/osf.io/8y925
Jané, M. B., Xiao, Q., Yeung, S. K., Ben-Shachar, M. S., Caldwell, A. R., Cousineau, D., … Feldman, G. (2024b). Guide to effect sizes and confidence intervals. https://doi.org/10.17605/OSF.IO/D8C4G
Lakens, D., Adolfi, F. G., Albers, C. J., Anvari, F., Apps, M. A., Argamon, S. E., ... & Zwaan, R. A. (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168-171. https://doi.org/10.1038/s41562-018-0311-x
Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science, 1, 259-269. https://doi.org/10.1177/2515245918770963
Lemay, K. R., Tulloch, H. E., Pipe, A. L., & Reed, J. L. (2019). Establishing the minimal clinically important difference for the hospital anxiety and depression scale in patients with cardiovascular disease. Journal of Cardiopulmonary Rehabilitation and Prevention, 39, E6-E11. https://doi.org/10.1097/HCR.0000000000000379
Lenth R (2023). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.8.9, https://CRAN.R-project.org/package=emmeans
Loftus, E. F. (2005). Planting misinformation in the human mind: A 30-year investigation of the malleability of memory. Learning & Memory, 12, 361-366. https://doi.org/10.1101/lm.94705
Lüdecke, D., Ben-Shachar, M., Patil, I., Waggoner, P., & Makowski, D. (2021). Performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60), 3139. https://doi.org/10.21105/joss.03139
Lydick, E., & Epstein, R. S. (1993). Interpretation of quality of life changes. Quality of life Research, 2(3), 221-226. https://doi.org/10.1007/BF00435226
Makowski, D., Ben-Shachar, M., & Lüdecke, D. (2019). bayestestR: Describing Effects and their Uncertainty, Existence and Significance within the Bayesian Framework. Journal of Open Source Software, 4(40), 1541. http://doi.org/10.21105/joss.01541
McGlothlin, A. E., & Lewis, R. J. (2014). Minimal clinically important difference: defining what really matters to patients. Jama, 312(13), 1342-1343. https://doi.org/10.1001/jama.2014.13128
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103-115. https://doi.org/10.1086/288135
Metzler, C. M. (1974). Bioavailability - a problem in equivalence. Biometrics, 30(2), 309–317. https://doi.org/10.2307/2529651
Morris, P. E., & Fritz, C. O. (2013). Effect sizes in memory research. Memory, 21(7), 832-842. https://doi.org/10.1080/09658211.2013.763984
Murphy, S. L., Merz, R., Reimann, L., & Fernández, A. (2024, November 13). Nonsignificance misinterpreted as an effect’s absence in psychology: Prevalence and temporal analyses. PsyArXiv. https://doi.org/10.31234/osf.io/hm2tu
Murphy, K. R., & Myors, B. (1999). Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology, 84, 234-248. https://doi.org/10.1037/0021-9010.84.2.234
National Academies of Sciences, Engineering, and Medicine (2019). Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303.
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology, 73, 719-748. https://doi.org/10.1146/annurev-psych-020821-114157
Otgaar, H., Riesthuis, P., Neal, T. M. S., Chin, J., Boskovic, I., & Rassin, E. (2023). If generalization is the grail, practical relevance is the nirvana: Considerations from the contribution of psychological science of memory to law. Journal of Applied Research in Memory and Cognition, 12, 176–179. https://doi.org/10.1037/mac0000116
Otgaar, H., Howe, M. L., & Dodier, O. (2022a). What can expert witnesses reliably say about memory in the courtroom?. Forensic Science International: Mind And Law, 3, 100106. https://doi.org/10.1016/j.fsiml.2022.100106
Otgaar, H., Riesthuis, P., Ramaekers, J. G., Garry, M., & Kloft, L. (2022b). The importance of the smallest effect size of interest in expert witness testimony on alcohol and memory. Frontiers in Psychology, 13, 980533. https://doi.org/10.3389/fpsyg.2022.980533
Panzarella, E., Beribisky, N., & Cribbie, R. A. (2021). Denouncing the use of field-specific effect size distributions to inform magnitude. PeerJ, 9, e11383. https://doi.org/10.7717/peerj.11383
Pek, J., & Flora, D. B. (2018). Reporting effect sizes in original psychological research: A discussion and tutorial. Psychological methods, 23(2), 208-225. https://doi.org/10.1037/met0000126
Perugini, A., Gambarota, F., Toffalini, E., Lakens, D., Pastore, M., Finos, L., ... & Altoè, G. (2025). The benefits of reporting critical-effect-size values. Advances in Methods and Practices in Psychological Science, 8(2), 25152459251335298. https://doi.org/10.1177/25152459251335298 h
R Core Team (2023). R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Revelle, W. (2023). Psych: Procedures for Psychological, Psychometric, and Personality Research. Northwestern University, Evanston, Illinois. R package version 2.3.12, https://CRAN.R-project.org/package=psych
Riesthuis, P., & Cribbie, R. (2025). When are scientific findings deemed practically or theoretically relevant? A literature review on minimum-effect testing. The Quantitative Methods for Psychology, 21(2), 82-94. https://doi.org/10.20982/tqmp.21.2.p082
Riesthuis, P., Howe, M. L., & Otgaar, H. (2025c). Meaningful approaches to assessing the size of effects in memory research: applications and recommendations for study design, interpretation, and analysis. Memory, 33(5), 485–494. https://doi.org/10.1080/09658211.2025.2492496
Riesthuis, P., Mangiulli, I., Broers, N., & Otgaar, H. (2022). Expert opinions on the smallest effect size of interest in false memory research. Applied Cognitive Psychology, 36, 203-215. https://doi.org/10.1002/acp.3911
Riesthuis, P., & Otgaar, H. (2024a). An overview of the replicability, generalizability and practical relevance of eyewitness testimony research in the Journal of Criminal Psychology. Journal of Criminal Psychology, 15(2), 176-194. https://doi.org/10.1108/JCP-04-2024-0031
Riesthuis, P., & Otgaar, H. (2024). On the use of receiver operating characteristic area under the curve in eyewitness memory research. Legal and Criminological Psychology, 00, 1–19. https://doi.org/10.1111/lcrp.12300
Riesthuis, P., Otgaar, H., & Bücken, C. (2025a). Simulation-based power analyses for the smallest effect size of interest: A confidence-interval approach for minimum-effect and equivalence testing. Behavior Research Methods, 57(4), 1-20. https://doi.org/10.31234/osf.io/sq7m3
Riesthuis, P., Rassin, E., Bücken, C., Booker, A., Chin, J., Goldfarb, D., Deferme, D., & Otgaar, H. (2025b). Through the lens of legal professionals: Examining the smallest effect size of interest for eyewitness memory research. The International Journal of Evidence & Proof, 13657127251357630. https://doi.org/10.1177/13657127251357630
Riesthuis, P. (2024). Simulation-based power analyses for the smallest effect size of interest: A confidence-interval approach for minimum-effect and equivalence testing. Advances in Methods and Practices in Psychological Science, 7(2). https://doi.org/10.1177/25152459241240722
Riesthuis, P. (2025). Assessing power analysis practices in eyewitness memory research: A review and simulation study. Journal of Applied Research in Memory and Cognition. Advance online publication. https://doi.org/10.1037/mac0000237
Seaman, M. A., & Serlin, R. C. (1998). Equivalence confidence intervals for two-group comparisons of means. Psychological Methods, 3(4), 403–411. https://doi.org/10.1037/1082-989X.3.4.403
Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123-1128. https://doi.org/10.1177/1745691617708630
Sjoberg, D. D., Whiting, K., Curry, M., Lavery, J.A., & Larmarange, J. (2021). Reproducible summary tables with the gtsummary package. The R Journal,13, 570–80. https://doi.org/10.32614/RJ-2021-053
Smiley, A. H., Glazier, J. J., & Shoda, Y. (2023). Null regions: a unified conceptual framework for statistical inference. Royal Society Open Science, 10(11), 221328. https://doi.org/10.1098/rsos.221328
Staniszewska, S., Haywood, K. L., Brett, J., & Tutton, L. (2012). Patient and public involvement in patient-reported outcome measures: evolution not revolution. The Patient-Patient-Centered Outcomes Research, 5, 79-87. https://doi.org/10.2165/11597150-000000000-00000
Thériault, R. (2023). rempsyc: Convenience functions for psychology. Journal of Open Source Software, 8(87), 5466. https://doi.org/10.21105/joss
Thompson-Cannino, J., Cotton, R., & Torneo, E. (2009). Picking cotton: Our memoir of injustice and redemption. Macmillan.
Tullett, A. M. (2022). The limitations of social science as the arbiter of blame: An argument for abandoning retribution. Perspectives on Psychological Science, 17(4), 995-1007. https://doi.org/10.1177/17456916211033284
van der Heijde, D., Lassere, M., Edmonds, J., Kirwan, J., Strand, V., & Boers, M. (2001). Minimal clinically important difference in plain films in RA: group discussions, conclusions, and recommendations. OMERACT Imaging Task Force. The Journal of Rheumatology, 28, 914-917.
Westlake, W. J. (1972). Use of confidence intervals in analysis of comparative bioavailability trials. Journal of Pharmaceutical Sciences, 61(8), 1340-1341. https://doi.org/10.1002/jps.2600610845
Wie, T., & Simko, V. (2021). R package 'corrplot': Visualization of a Correlation Matrix (Version 0.92). Available from https://github.com/taiyun/corrplot
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D. A., François, R., ... & Yutani, H. (2019). Welcome to the Tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, H., Bryan, J. (2023). readxl: Read Excel Files. R package version 1.4.3, https://CRAN.R-project.org/package=readxl
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Wiedemann, M., Thew, G. R., Košir, U., & Ehlers, A. (2022). lcsm: An R package and tutorial on latent change score modelling. Wellcome Open Research, 7:149. https://doi.org/10.12688/wellcomeopenres.17536.1
Wilke C (2022). ggridges: Ridgeline Plots in 'ggplot2'. R package version 0.5.4. https://CRAN.R-project.org/package=ggridges
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685
Recommended Citation
Riesthuis, Paul; Otgaar, Henry; Setiawan, Tery; Sumampouw, Nathanael; and Bücken, Charlotte
(2025)
"From p-Values to Practical Relevance: An Introduction to Effect Sizes Through a Legal Psychological Example,"
Psychological Research on Urban Society: Vol. 8:
No.
2, Article 1.
DOI: 10.7454/proust.v8i2.1204
Available at:
https://scholarhub.ui.ac.id/proust/vol8/iss2/1
Included in
Cognitive Psychology Commons, Quantitative Psychology Commons, Statistics and Probability Commons




