Certifying outputs and detecting drift

This vignette covers Tier 2 of the reproducr workflow in depth: certify(), check_drift(), and list_certs(). These three functions together form the baseline and drift detection system.

The problem they solve — a real scenario

Scenario — The revision drift problem

You submit a paper in March. Before submission you run the analysis and note the key results: hazard ratio 0.582 (95% CI: 0.446–0.760, p < 0.001).

In May a reviewer asks for a revision. While working on the response you upgrade your packages — including lme4, which adjusted its default optimizer tolerances between versions 1.1.29 and 1.1.30. You re-run the analysis: hazard ratio 0.591 (95% CI: 0.452–0.768).

The numbers are slightly different. No error was thrown. The code is identical. Without a record of what the March run produced, you would not know whether the change came from your revision or from the package upgrade.

[DRIFTED] hr:       0.582 → 0.591
[DRIFTED] ci_lower: 0.446 → 0.452
[DRIFTED] ci_upper: 0.760 → 0.768

With certify() and check_drift(), this is caught immediately and you can investigate before submitting to the reviewer.

More broadly, packages change hands, maintainers push silent fixes, platform-level libraries (BLAS, LAPACK) get updated by system administrators, and R itself changes RNG defaults between minor versions. Any of these can alter your numerical results without producing an error.

certify() and check_drift() detect this. The idea is simple:

  1. After a successful analysis run, hash the key outputs and store the hashes.
  2. Later — after any change to the environment — re-run the analysis and compare the new hashes against the stored ones.
  3. Any mismatch is reported explicitly, by output name.

certify() — creating a baseline

What gets hashed

Pass a fully named list of any R objects you want to protect. Common choices:

model <- lm(mpg ~ wt + cyl, data = mtcars)

certify(
  outputs = list(
    coefs       = coef(model),
    r_squared   = summary(model)$r.squared,
    sigma       = sigma(model),
    n_obs       = nrow(mtcars),
    n_complete  = sum(complete.cases(mtcars)),
    group_means = aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
  ),
  tag = "baseline-v1",
  script = "analysis.R",
  file = cert_file
)
#> reproducr: certified 6 output(s) [2026-06-15] under tag 'baseline-v1'

Choosing what to certify

Certify outputs that are:

Avoid certifying objects that are expected to differ across runs by design, such as proc.time() outputs or Sys.time() values.

Tags and the certification store

Every certification requires a tag — a human-readable label:

certify(
  outputs = list(coefs = coef(model)),
  tag     = "pre-peer-review",
  file    = cert_file
)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'pre-peer-review'

certify(
  outputs = list(coefs = coef(model)),
  tag     = "post-revision",
  file    = cert_file
)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'post-revision'

Passing a duplicate tag overwrites the existing record with a warning:

certify(
  outputs = list(coefs = coef(model)),
  tag     = "baseline-v1",
  file    = cert_file
)
#> Warning: Tag 'baseline-v1' already exists in
#> '/var/folders/1c/cc14n8555ll9mmd98xp8jr_w0000gn/T//RtmpXSQF66/file7f3fdf32d'.
#> Overwriting.
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'baseline-v1'

list_certs() — inspecting the store

list_certs(file = cert_file)
#>               tag                timestamp r_version            os n_outputs
#> 1     baseline-v1 2026-06-15T17:03:19+0200     4.4.2 Darwin 25.5.0         1
#> 2 pre-peer-review 2026-06-15T17:03:19+0200     4.4.2 Darwin 25.5.0         1
#> 3   post-revision 2026-06-15T17:03:19+0200     4.4.2 Darwin 25.5.0         1
#>   script
#> 1   <NA>
#> 2   <NA>
#> 3   <NA>

check_drift() — comparing against a baseline

Basic usage

model2 <- lm(mpg ~ wt + cyl, data = mtcars)

result <- check_drift(
  outputs = list(
    coefs       = coef(model2),
    r_squared   = summary(model2)$r.squared,
    sigma       = sigma(model2),
    n_obs       = nrow(mtcars),
    n_complete  = sum(complete.cases(mtcars)),
    group_means = aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
  ),
  against = "baseline-v1",
  file = cert_file
)
#> -- reproducr drift check vs 'baseline-v1' --
#>   Verdict  : ALL OUTPUTS MATCH
#>   OK       : 1
#>   Drifted  : 0
#>   Missing  : 0
#>   New      : 5

The four statuses

certify(
  outputs = list(
    stays_same  = 42L,
    will_change = coef(lm(mpg ~ wt, data = mtcars)),
    will_vanish = "this output disappears next run"
  ),
  tag = "four-statuses",
  file = cert_file
)
#> reproducr: certified 3 output(s) [2026-06-15] under tag 'four-statuses'

demo_result <- check_drift(
  outputs = list(
    stays_same  = 42L,
    will_change = coef(lm(mpg ~ hp, data = mtcars)),
    brand_new   = "this output is new"
  ),
  against = "four-statuses",
  file = cert_file
)
#> -- reproducr drift check vs 'four-statuses' --
#>   Verdict  : DRIFT DETECTED
#>   OK       : 1
#>   Drifted  : 1
#>   Missing  : 1
#>   New      : 1
#>   Drifted outputs:
#>     - will_change

print(demo_result)
#> 
#> -- reproducr drift report --
#> 
#> [OK]      stays_same
#> [DRIFT]   will_change
#>             Hash mismatch (numeric tolerance check requires stored values).
#> [NEW]     brand_new
#>             Not present in the baseline certification.
#> [MISSING] will_vanish
#>             Present in baseline but not supplied to check_drift().
Status Meaning
ok Hash matches the baseline exactly
drifted Hash differs — output has changed
missing Present in baseline, not supplied to check_drift()
new Supplied to check_drift(), not in baseline

Using "latest"

certify(outputs = list(x = 1L), tag = "run-1", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'run-1'
certify(outputs = list(x = 1L), tag = "run-2", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'run-2'
certify(outputs = list(x = 1L), tag = "run-3", file = cert_file)
#> reproducr: certified 1 output(s) [2026-06-15] under tag 'run-3'

check_drift(outputs = list(x = 1L), against = "latest", file = cert_file)
#> reproducr: comparing against latest tag: 'run-3'
#> -- reproducr drift check vs 'run-3' --
#>   Verdict  : ALL OUTPUTS MATCH
#>   OK       : 1
#>   Drifted  : 0
#>   Missing  : 0
#>   New      : 0

Using drift results programmatically

result <- check_drift(outputs = current_outputs, against = "latest")

n_drifted <- sum(result$status == "drifted")
if (n_drifted > 0L) {
  drifted_names <- result$output[result$status == "drifted"]
  stop(sprintf(
    "%d output(s) have drifted since last certification: %s",
    n_drifted,
    paste(drifted_names, collapse = ", ")
  ))
}

Version control

Commit .reproducr.rds to your Git repository. This gives you a permanent, auditable history of what every run produced, and lets you compare against any past milestone.

Add to .gitattributes to prevent noisy diffs:

.reproducr.rds binary