You finish an analysis. The code runs. The numbers look right. But are they stable?
Package updates change function behaviour silently. Stochastic code without a fixed seed produces different results on every run. Results certified last month may drift this month — with no error and no warning.
reproducr makes these risks visible and trackable via a
three-tier workflow:
It works with your existing setup. If you use renv,
reproducr reads your lockfile automatically. No
configuration required.
These are not hypothetical. Each scenario describes a class of problem that occurs routinely in research and regulated workflows, produces no error, and is invisible without explicit tooling.
You write an analysis in January using dplyr 1.0.4 and share it with a colleague who has dplyr 1.1.2.
results <- mtcars |>
dplyr::group_by(cyl) |>
dplyr::summarise(mean_mpg = mean(mpg))
# You then chain a further operation:
results |> dplyr::mutate(rank = dplyr::row_number())In dplyr 1.0.x, summarise() retained grouping by
default. In dplyr 1.1.x it drops the last grouping level. Your
colleague’s mutate() now operates on ungrouped data — the
rank column is computed differently. No error. No warning.
Different numbers.
reproducr flags this immediately:
[HIGH] dplyr::summarise
In dplyr 1.1.0, summarise() changed its default grouping behaviour...
You develop a model locally on R 3.5.3 and deploy to a production server running R 3.6.2.
R 3.6.0 changed the default RNG algorithm for sample().
The same seed now produces a different train/test split. Your model is
trained on different data than you validated locally. Accuracy metrics
differ silently across environments.
reproducr flags this:
[HIGH] stats::sample
In R 3.6.0, the default RNG algorithm changed...
You use renv to lock your environment and restore it six
months later on a new machine. Everything installs correctly but results
differ.
renv locked readr 2.0.1. Your original
analysis was written with readr 1.4.0. The lockfile
captured the version you were already on when you ran
renv::init() — past the breaking change. You never compared
against pre-2.0 output.
data <- readr::read_csv("clinical_data.csv")
# Column "patient_id" now parses as character instead of double.
# Downstream merge silently drops rows.renv cannot detect this because it only sees versions,
not behaviour. reproducr sees the function call and flags
it:
[HIGH] readr::read_csv
In readr 2.0.0, read_csv() switched to the vroom backend.
Column type guessing changed...
The entry point is audit_script(). It reads your R
source files, extracts every qualified pkg::fn call, and
resolves which version of each package is in use.
# Create a small example script
script <- tempfile(fileext = ".R")
writeLines(c(
"# Example analysis",
"set.seed(237)",
"x <- dplyr::filter(mtcars, cyl == 4)",
"y <- dplyr::summarise(x, mean_mpg = mean(mpg), n = dplyr::n())",
"fit <- lm(mpg ~ wt, data = x)",
"z <- stats::rnorm(nrow(y))",
"out <- base::sort(unique(x$gear))"
), script)
report <- audit_script(script, renv = FALSE, verbose = FALSE)
print(report)
#>
#> -- reproducr audit report [2026-06-15 17:03] --
#>
#> Files scanned: 1
#> Packages found: 3
#> Calls detected: 5
#> R version: 4.4.2
#> Platform: Darwin 25.5.0
#> Versions from: installed library
#>
#> Next step: risks <- risk_score(report)report$calls
#> file
#> 1 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> 2 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> 3 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> 4 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> 5 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> line pkg fn pkg_version
#> 1 3 dplyr filter 1.2.0
#> 2 4 dplyr summarise 1.2.0
#> 3 4 dplyr n 1.2.0
#> 4 6 stats rnorm 4.4.2
#> 5 7 base sort 4.4.2Pass the report to risk_score() to run three independent
checks:
risks <- risk_score(report)
print(risks)
#>
#> -- reproducr risk score --
#>
#> HIGH: 1
#> MEDIUM: 0
#> LOW: 1
#>
#> [HIGH] dplyr::summarise (line 4 in file7f358a6ae08.R)
#> Check : changelog
#> Details : In dplyr 1.1.0, summarise() changed its default
#> groupingbehaviour: it now drops the last grouping level and
#> returnsan ungrouped data frame by default (.groups =
#> 'drop_last').Code that relied on the result being grouped (e.g.
#> chainingfurther group operations without re-grouping) will
#> producesilently different results.
#> Reference: https://dplyr.tidyverse.org/news/index.html#dplyr-110
#>
#> [LOW] base::sort (line 7 in file7f358a6ae08.R)
#> Check : locale_check
#> Details : sort() output is locale-sensitive. Current locale: C. Results may
#> differ on machines with different LC_COLLATE or LC_TIME settings.
#> Reference: https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.html"changelog" — checks calls against a
curated database of known silent breaking changes"seed_check" — flags stochastic
functions without a nearby set.seed()"locale_check" — flags functions whose
output varies by system locale# High-severity only
high_risks <- risk_score(report, min_risk = "high")
# Just the seed check
seed_issues <- risk_score(report, methods = "seed_check")# As a plain data frame for downstream use
as.data.frame(risks)
#> file
#> 1 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> 2 /private/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T/RtmpXSQF66/file7f358a6ae08.R
#> line call pkg_version risk check
#> 1 4 dplyr::summarise 1.2.0 high changelog
#> 2 7 base::sort 4.4.2 low locale_check
#> description
#> 1 In dplyr 1.1.0, summarise() changed its default groupingbehaviour: it now drops the last grouping level and returnsan ungrouped data frame by default (.groups = 'drop_last').Code that relied on the result being grouped (e.g. chainingfurther group operations without re-grouping) will producesilently different results.
#> 2 sort() output is locale-sensitive. Current locale: C. Results may differ on machines with different LC_COLLATE or LC_TIME settings.
#> reference
#> 1 https://dplyr.tidyverse.org/news/index.html#dplyr-110
#> 2 https://stat.ethz.ch/R-manual/R-devel/library/base/html/locales.htmlAfter running an analysis, certify the key outputs using
certify().
After any environment change, re-run check_drift():
result <- check_drift(
outputs = list(
coefs = coef(model),
r_squared = summary(model)$r.squared,
n_obs = nrow(mtcars)
),
against = "baseline-v1",
file = cert_file
)
#> -- reproducr drift check vs 'baseline-v1' --
#> Verdict : ALL OUTPUTS MATCH
#> OK : 3
#> Drifted : 0
#> Missing : 0
#> New : 0# Different model — shows drift
model2 <- lm(mpg ~ hp, data = mtcars)
check_drift(
outputs = list(coefs = coef(model2)),
against = "baseline-v1",
file = cert_file
)
#> -- reproducr drift check vs 'baseline-v1' --
#> Verdict : DRIFT DETECTED
#> OK : 0
#> Drifted : 1
#> Missing : 2
#> New : 0
#> Drifted outputs:
#> - coefscat(repro_report(report, risks, format = "text", style = "academic"))
#> Methods paragraph (reproducr)
#>
#> All analyses were conducted in R (version 4.4.2) on Darwin 25.5.0. The following packages were used: dplyr (v1.2.0), stats (v4.4.2), base (v4.4.2). Reproducibility auditing (reproducr) identified 2 potential concern(s) (1 high, 0 medium severity) relating to known behavioural changes in package APIs across versions. The full audit report and certification records are available in the supplementary materials.
#> # Methods paragraph (reproducr)
#>
#> All analyses were conducted in R (version 4.4.2) on Darwin 25.5.0. The following packages were used: dplyr (v1.2.0), stats (v4.4.2), base (v4.4.2). Reproducibility auditing (reproducr) identified 2 potential concern(s) (1 high, 0 medium severity) relating to known behavioural changes in package APIs across versions. The full audit report and certification records are available in the supplementary materials.badge <- repro_badge(report, risks, output = "markdown")
#> [](https://repro-stats.github.io/reproducr/)
cat(badge)
#> [](https://repro-stats.github.io/reproducr/)library(reproducr)
# Tier 1
report <- audit_script("analysis.R")
risks <- risk_score(report)
# Tier 2
certify(
outputs = list(coefs = coef(my_model)),
tag = "submission-v1"
)
check_drift(
outputs = list(coefs = coef(my_model)),
against = "submission-v1"
)
# Tier 3
repro_report(report, risks, format = "html", style = "pharma")
repro_badge(report, risks, output = "README")