Summary

The sprtt package provides a toolbox for Sequential Probability Ratio Tests (SPRTs), implementing modern variants including sequential t-tests and sequential ANOVA for applied and methodological research. While traditional fixed-sample designs require researchers to commit to a predetermined sample size, SPRTs enable continuous evidence evaluation with predefined stopping rules – terminating data collection as soon as evidence crosses a threshold for rejecting or accepting the null hypothesis (Wald 1945). Crucially, this flexibility comes without inflating long-run Type I and Type II error rates beyond the levels specified in advance. For Wald’s original SPRT under simple hypotheses, 50% fewer observations are required compared to a Neyman-Pearson fixed-sample design (Wald 1945). Newer variants extend these methods to composite hypotheses through sequential t-tests and sequential ANOVA – designs standard in fields like psychology and medicine. For the composite-hypothesis extensions implemented in sprtt, efficiency cannot be derived analytically (Wald 1945; Cox 1952; Köllerström and Wetherill 1979; Schnuerch and Erdfelder 2020), but simulation studies have shown that error rates are well-controlled and efficiency gains remain of similar magnitude (Schnuerch and Erdfelder 2020; Steinhilber, Schnuerch, and Schubert 2024; Stefan et al. 2022). Despite the long history of SPRTs, the sprtt package is the first to provide accessible software implementations for both sequential t-tests (Rushton 1950; Hajnal 1961; Schnuerch and Erdfelder 2020) and sequential ANOVA (Wetherill and Glazebrook 1986; Steinhilber, Schnuerch, and Schubert 2024). The package implements these validated procedures and additionally provides example datasets, data generating functions, sample size planning, and visualization tools to facilitate the adoption of SPRTs in applied research.

Statement of need

Due to the replication crisis (Open Science Collaboration 2015; Ioannidis 2005; Bogdan 2025) in empirical fields like psychology and medicine, statistical procedures have been scrutinized, and new alternatives have gained attention (Cumming 2014; Daniël Lakens, Scheel, and Isager 2018; Wagenmakers et al. 2018). Sequential testing methods have become increasingly popular in recent years as they directly address pressing demands in empirical research: the need to minimize resource expenditure and participant burden without sacrificing statistical rigor (Schnuerch and Erdfelder 2020; Steinhilber, Schnuerch, and Schubert 2024; Ly et al. 2025; Daniel Lakens, Pahlke, and Wassmer 2021; Erdfelder and Schnuerch 2021). This is relevant across all empirical research, and particularly vital in clinical settings where continued data collection can carry real ethical costs.

Although SPRTs are well-established in the statistical literature (Wald 1947; Siegmund 1985; Bartroff, Lai, and Shih 2012; Tartakovsky, Nikiforov, and Basseville 2014), their original formulation relies on simple hypotheses, which are rarely used in applied research: they require researchers to specify nuisance parameters – such as the variance – which are rarely known in advance. As a first step toward practical applicability, variants based on composite hypotheses were developed, namely the sequential t-test (Rushton 1950; Hajnal 1961) and sequential ANOVA (Wetherill and Glazebrook 1986). As a second step, these variants were recently validated in simulation studies, establishing their statistical properties under realistic conditions (Schnuerch and Erdfelder 2020; Steinhilber, Schnuerch, and Schubert 2024). As a third step, this methodological progress needed to be matched by accessible software: prior to sprtt, the only available implementation was a bare R script provided alongside validation work (Schnuerch and Erdfelder 2020). Translating these promising statistical methods into accessible, user-friendly, and open-source software is therefore essential for finally closing the gap between statistical theory and adoption in practice.

State of the field

The landscape of sequential testing software is sparse. Beyond R, very few software packages appear to exist, though several major technology companies including Netflix, Uber, and Spotify have either published on sequential testing and SPRT variants or stated their use, suggesting that proprietary implementations may exist in industry (Bibaut, Kallus, and Lindon 2024; Schultzberg and Ankargren 2023; Deb et al. 2018). The only Python implementation, the sprt package on PyPI (Yu 2017), covers Wald’s SPRT for Normal, Binomial, and Poisson distributions but has not been updated since its initial release in 2017 and lacks documentation. No SPRT implementations seem to exist in Julia. A JavaScript library for sequential generalized likelihood ratio tests SeGLiR (Øygard 2014) targets browser-based A/B testing and has not been maintained since 2017. JASP (Love et al. 2019) is a free and open-source application that implements sequential Bayesian hypothesis testing (Schönbrodt et al. 2017), using a Bayes Factor rather than a likelihood ratio as the monitoring statistic, which requires the specification of prior distributions. These Bayesian tools address an important but different use case. The present package is intended for researchers who prefer a frequentist sequential framework, want to control long-run Type I and Type II error rates in familiar Neyman–Pearson terms, or wish to avoid the need to specify prior distributions. In R, the package SPRT (Budihal 2025) implements Wald’s original sequential tests for simple hypotheses, the gsDesign (Anderson 2026) package provides a function for truncated binomial SPRTs, and the MSPRT (Pramanik, Johnson, and Bhattacharya 2020) and Sequential (Silva and Kulldorff 2025) packages cover a variety of truncated SPRT variants. Beyond the SPRT, anytime-valid inference has emerged as an alternative sequential testing framework, using e-values to guarantee validity at any sample size (Ramdas et al. 2023; Grünwald, Heide, and Koolen 2023) – current software implementations include the R package safestats (Ly et al. 2024; Ly et al. 2025) and the Python package savvi (Assunção 2024). To our knowledge, no publicly available software implements sequential t-tests or sequential one-way ANOVA as described and validated by Schnuerch and Erdfelder (2020) and Steinhilber, Schnuerch, and Schubert (2024). The sprtt package fills this gap directly.

Research impact

The sprtt package was first published on CRAN in 2021 and has since accumulated close to 13,000 downloads, averaging approximately 200 downloads per month in the 12 months preceding March 2026 (Steinhilber, Schnuerch, and Schubert 2023). The package has been used in experimental research (Quevedo Pütter and Erdfelder 2022), simulation studies (Steinhilber, Schnuerch, and Schubert 2024, 2025), and has been referenced in methodological work (Schubert et al. 2025; Fischer and Ramdas 2025). The target audience includes applied researchers using SPRT variants in their empirical work, as well as methodologists conducting simulation studies to gain further insights into the properties of SPRTs.

Monthly CRAN downloads of the sprtt package since its first release in August 2021. Dashed vertical lines indicate CRAN release versions. The LOESS trend line with 95% confidence band reflects the overall download trajectory across complete months.
Monthly CRAN downloads of the sprtt package since its first release in August 2021. Dashed vertical lines indicate CRAN release versions. The LOESS trend line with 95% confidence band reflects the overall download trajectory across complete months.

Software design

The sprtt package is built around two main user-facing functions: seq_ttest() and seq_anova(). The seq_ttest() function implements the sequential t-test and deliberately mirrors the interface of the t.test() function from the stats package to ensure familiarity for R users. The seq_anova() function follows a similar design philosophy, maintaining consistency across the package’s interface.

Design principle

The core design principle is modularity: each internal function should perform one task well. This approach emphasizes simplicity, testability, clear structure, and minimal code repetition. The internal architecture of the core functions is documented in more detail in the developer vignette of the sprtt package. The package is designed to return interpretable results not only when a stopping boundary is crossed, but also when monitoring remains inconclusive at the current stage of data collection. More generally, functions perform input validation to catch common issues such as invalid argument types, missing values, or out-of-range parameters.

While the primary focus remains on implementing well-tested SPRT variants with proven efficiency and error rate control, the package continuously expands its functionality to improve user experience. Supporting features include example datasets, data simulation functions, visualization tools for sequential ANOVA results, and sample size planning for sequential ANOVA. The lifecycle package is used throughout to clearly communicate the maturity status of each function – an important consideration for research software where interface stability directly affects reproducibility. The core functions seq_ttest(), seq_anova(), and the data simulation utilities are stable: we commit to not introducing silent breaking changes to these functions. Where changes are unavoidable, users will be informed through deprecation warnings and messaging well in advance. Newer additions, including the visualization tools and the sample size planning function, are marked as experimental, reflecting that their interfaces may still undergo substantial revisions as they mature.

A concrete illustration of why this distinction matters is the plot function for seq_ttest(). Mirroring the t.test() interface was a deliberate choice to lower the barrier to adoption, but as the package grew, a complication emerged: the wide variety of input formats accepted by t.test() has so far prevented the implementation of a consistent plot function for seq_ttest() – a feature that already exists for seq_anova() and is planned for a future release. Resolving this may require interface adjustments to seq_ttest(), which will be handled through the deprecation-with-messaging approach rather than silent breaking changes.

External data

Sample size planning for the implemented tests cannot be derived analytically and instead requires extensive Monte Carlo simulations to characterize sampling behavior across a wide range of parameter combinations. The plan_sample_size() function addresses this by generating an HTML report based on a pre-computed simulation dataset covering multiple effect sizes, group sizes, and Type II error rates – each estimated from 10,000 replications per condition, run on a high-performance computing cluster. Pre-computing this dataset offers several advantages over on-demand simulation: recommendations are returned instantly, all users access identical results ensuring reproducibility, and redundant computation across research groups is avoided. The trade-off is that the lookup covers only a predefined set of parameter combinations; users with custom scenarios are therefore directed to the simulation functions to generate tailored estimates.

However, the comprehensive nature of these simulations produces a dataset too large to bundle directly with the package under CRAN size constraints. To resolve this tension, the simulation dataset is maintained in a separate GitHub repository (https://github.com/MeikeSteinhilber/sprtt_plan_sample_size) and downloaded on demand, after which it is cached locally to avoid repeated downloads. This separation also serves a transparency purpose: the full simulation pipeline including the hierarchical SLURM scripts used for cluster execution is publicly available for inspection and verification. To give users direct control over this external dependency, the sprtt package includes dedicated helper functions (download_sample_size_data(), cache_info(), cache_clear()) for manually downloading, inspecting, and clearing the locally cached dataset. The generated HTML report records the package version and the exact version of the downloaded simulation dataset, allowing users to reproduce recommendations even if the external repository is updated later.

Software documentation

The sprtt package is documented through a dedicated website (https://meikesteinhilber.github.io/sprtt/), a README on both the main GitHub repository and the supplementary repository hosting simulation code and results for the plan_sample_size() function. The package further includes a comprehensive set of vignettes. Introductory vignettes cover general package usage and a recommended workflow and an introduction to SPRTs, complemented by a simple t-test use case. More advanced vignettes provide dedicated guidance on the sequential t-test and sequential one-way ANOVA. Finally, further topics are addressed in vignettes on sample size planning and a developer guide for users who want to contribute to or extend the package.

AI usage disclosure

The core sprtt implementation, all architectural decisions, and the research contributions are original human intellectual work. Development began in February 2021 and predates the widespread availability of modern AI-assisted programming tools, with the majority of the codebase written without AI assistance (CRAN releases: August 2021 and July 2023). For the latest release, generative AI (Claude, Anthropic) was used to assist with debugging new code, writing unit tests, and reviewing the package documentation for improvements. For this manuscript, AI was additionally used to support writing tasks such as improving grammar and spelling, formatting of references, and suggesting manuscript structure. In all cases, AI served an assistive role only, and all output was thoroughly reviewed and verified by the authors.

Acknowledgements

We thank the Carl Zeiss Foundation for the generous 5-year funding of SMART-AGE (P2019-01-003; 2021-2026). Parts of this research were supported by a grant from the German Research Foundation (Deutsche Forschungsgemeinschaft, GRK 2277) to the Research Training Group “Statistical Modeling in Psychology”. Parts of this research were conducted using the supercomputer Mogon II and services offered by Johannes Gutenberg University Mainz (hpc.uni-mainz.de).

References

Anderson, Keaven. 2026. “gsDesign: Group Sequential Design (Version 3.9.0) [R Package].” https://CRAN.R-project.org/package=gsDesign.
Assunção, Luís. 2024. “Savvi: Safe Anytime Valid Inference (Version 0.3.1) [Python Package].” https://pypi.org/project/savvi/.
Bartroff, Jay, Tze Leung Lai, and Mei-Chiung Shih. 2012. Sequential Experimentation in Clinical Trials: Design and Analysis. Springer Science & Business Media.
Bibaut, Aurelien, Nathan Kallus, and Michael Lindon. 2024. “Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations.” arXiv. https://doi.org/10.48550/arXiv.2212.14411.
Bogdan, Paul C. 2025. “One Decade Into the Replication Crisis, How Have Psychological Results Changed?” Advances in Methods and Practices in Psychological Science 8 (2): 25152459251323480. https://doi.org/10.1177/25152459251323480.
Budihal, Huchesh. 2025. “SPRT: Sequential Probability Ratio Test Method (Version 1.1.0) [R Package].” https://CRAN.R-project.org/package=SPRT.
Cox, D R. 1952. “Sequential Tests for Composite Hypotheses.” Mathematical Proceedings of the Cambridge Philosophical Society 48 (2): 290–99. https://doi.org/10.1017/S030500410002764X.
Cumming, Geoff. 2014. “The New Statistics: Why and How.” Psychological Science 25 (1): 7–29. https://doi.org/10.1177/0956797613504966.
Deb, Anirban, Suman Bhattacharya, Jeremy Gu, Tianxia Zhou, Eva Feng, and Mandie Liu. 2018. “Under the Hood of Uber’s Experimentation Platform.” Uber Engineering Blog. https://www.uber.com/en-EG/blog/xp/.
Erdfelder, Edgar, and Martin Schnuerch. 2021. “On the Efficiency of the Independent Segments Procedure: A Direct Comparison with Sequential Probability Ratio Tests.” Psychological Methods 26 (4): 501–6. https://doi.org/10.1037/met0000404.
Fischer, Lasse, and Aaditya Ramdas. 2025. “Improving Wald’s (Approximate) Sequential Probability Ratio Test by Avoiding Overshoot.” arXiv. https://doi.org/10.48550/arXiv.2410.16076.
Grünwald, Peter, Rianne de Heide, and Wouter Koolen. 2023. “Safe Testing.” arXiv. https://arxiv.org/abs/1906.07801.
Hajnal, J. 1961. “A Two-Sample Sequential t-Test.” Biometrika 48 (1/2): 65–75. https://doi.org/10.2307/2333131.
Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124. https://doi.org/10.1371/journal.pmed.0020124.
Köllerström, Julian, and G. Barrie Wetherill. 1979. SPRT’s for the Normal Correlation Coefficient.” Journal of the American Statistical Association 74 (368): 815–21. https://doi.org/10.2307/2286405.
Lakens, Daniel, Friedrich Pahlke, and Gernot Wassmer. 2021. “Group Sequential Designs: A Tutorial.” Preprint. PsyArXiv. https://doi.org/10.31234/osf.io/x4azm.
Lakens, Daniël, Anne M. Scheel, and Peder M. Isager. 2018. “Equivalence Testing for Psychological Research: A Tutorial.” Advances in Methods and Practices in Psychological Science 1 (2): 259–69. https://doi.org/10.1177/2515245918770963.
Love, Jonathon, Ravi Selker, Maarten Marsman, Tahira Jamil, Damian Dropmann, Josine Verhagen, Alexander Ly, et al. 2019. JASP: Graphical Statistical Software for Common Statistical Designs.” Journal of Statistical Software 88 (2): 1–17. https://doi.org/10.18637/jss.v088.i02.
Ly, Alexander, Udo Boehm, Peter Grunwald, Aaditya Ramdas, and Don van Ravenzwaaij. 2025. “A Tutorial on Safe Anytime-Valid Inference: Practical Maximally Flexible Sampling Designs for Experiments Based on e-Values.” PsyArXiv. https://doi.org/10.31234/osf.io/h5vae.
Ly, Alexander, Rosanne Turner, Judith ter Schure, Muriel Pérez-Ortiz, and Peter Grünwald. 2024. “Safestats: Safe Anytime-Valid Inference (Version 0.8.7) [R Package].” https://CRAN.R-project.org/package=safestats.
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716–16. https://doi.org/10.1126/science.aac4716.
Øygard, Audun Mathias. 2014. “SeGLiR: Javascript Library for Rapid A/B-Testing with Sequential Generalized Likelihood Ratio Tests (Last Commit 2016) [JavaScript Package].” https://github.com/auduno/SeGLiR.
Pramanik, Sandipan, Valen E. Johnson, and Anirban Bhattacharya. 2020. “MSPRT: Modified Sequential Probability Ratio Test (Version 3.0) [R Package].” https://CRAN.R-project.org/package=MSPRT.
Quevedo Pütter, J., and E. Erdfelder. 2022. “Alcohol-Induced Retrograde Facilitation? Mixed Evidence in a Preregistered Replication and Encoding-Maintenance-Retrieval Analysis.” Experimental Psychology 69 (6): 335–50. https://doi.org/10.1027/1618-3169/a000569.
Ramdas, Aaditya, Peter Grünwald, Vladimir Vovk, and Glenn Shafer. 2023. “Game-Theoretic Statistics and Safe Anytime-Valid Inference.” Statistical Science 38 (4): 576–601. https://doi.org/10.1214/23-STS894.
Rushton, S. 1950. “On a Sequential t-Test.” Biometrika 37: 326–33. https://doi.org/10.2307/2332385.
Schnuerch, Martin, and Edgar Erdfelder. 2020. “Controlling Decision Errors with Minimal Costs: The Sequential Probability Ratio t Test.” Psychological Methods 25 (2): 206–26. https://doi.org/10.1037/met0000234.
Schönbrodt, Felix D., Eric-Jan Wagenmakers, Michael Zehetleitner, and Marco Perugini. 2017. “Sequential Hypothesis Testing with Bayes Factors: Efficiently Testing Mean Differences.” Psychological Methods 22 (2): 322–39. https://doi.org/10.1037/met0000061.
Schubert, Anna-Lena, Meike Steinhilber, Heemin Kang, and Daniel S. Quintana. 2025. “Improving Statistical Reporting in Psychology.” Communications Psychology 3 (1): 156.
Schultzberg, Mårten, and Sebastian Ankargren. 2023. “Choosing a Sequential Testing FrameworkComparisons and Discussions.” Spotify Engineering Blog. https://engineering.atspotify.com/2023/03/choosing-sequential-testing-framework-comparisons-and-discussions.
Siegmund, David. 1985. Sequential Analysis. Springer Series in Statistics. New York, NY: Springer. https://doi.org/10.1007/978-1-4757-1862-1.
Silva, Ivair Ramos, and Martin Kulldorff. 2025. “Sequential: Exact Sequential Analysis for Poisson and Binomial Data (Version 4.5.2) [R Package].” https://CRAN.R-project.org/package=Sequential.
Stefan, Angelika M., Felix D. Schönbrodt, Nathan J. Evans, and Eric-Jan Wagenmakers. 2022. “Efficiency in Sequential Testing: Comparing the Sequential Probability Ratio Test and the Sequential Bayes Factor Test.” Behavior Research Methods 54 (6): 3100–3117. https://doi.org/10.3758/s13428-021-01754-8.
Steinhilber, Meike, Martin Schnuerch, and Anna-Lena Schubert. 2023. “Sprtt: Sequential Probability Ratio Test Toolbox (Version 0.2.0) [R Package].” https://CRAN.R-project.org/package=sprtt.
———. 2024. “Sequential Analysis of Variance: Increasing Efficiency of Hypothesis Testing.” Psychological Methods, September. https://doi.org/10.1037/met0000677.
———. 2025. “The Dark Side of Sequential Testing: A Simulation Study on Questionable Research Practices.” PsyArXiv: PsyArXiv. https://doi.org/10.31234/osf.io/vkbu3_v1.
Tartakovsky, Alexander, Igor Nikiforov, and Michele Basseville. 2014. Sequential Analysis: Hypothesis Testing and Changepoint Detection. CRC Press.
Wagenmakers, Eric-Jan, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Jonathon Love, Ravi Selker, et al. 2018. “Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications.” Psychonomic Bulletin & Review 25 (1): 35–57. https://doi.org/10.3758/s13423-017-1343-3.
Wald, Abraham. 1945. “Sequential Tests of Statistical Hypotheses.” The Annals of Mathematical Statistics 16 (2): 117–86.
———. 1947. Sequential Analysis. New York: Wiley.
Wetherill, G. Barrie, and Kevin D. Glazebrook. 1986. Sequential Methods in Statistics. 3rd ed. Monographs on Statistics and Applied Probability. London ; New York: Chapman and Hall.
Yu, Zhenning. 2017. “Sprt: Sequential Probability Ratio Test (Version 0.0.1) [Python Package].” https://pypi.org/project/sprt/.