Introduction to HOIF: Higher-Order Influence Function Estimators for the ATE

Xingyu Chen

2026-06-19

1 Introduction

The HOIF package implements Higher-Order Influence Function (HOIF) estimators for Average Treatment Effect (ATE) estimation in causal inference. This methodology extends the doubly robust (DR) estimator—also known as the Augmented Inverse Propensity Weighted (AIPW) or Double Machine Learning (DML) estimator—by systematically accounting for higher-order bias terms.

1.1 Background

The HOIF methodology was developed through a series of seminal works by James M. Robins and collaborators:

The key idea is to represent the bias of the standard DR estimator as a sum of higher-order influence functions, which can be estimated and used to debias the original estimator.

The computation of the higher-order U-statistics that arise in HOIF estimators is delegated to the ustats package; the algorithm and its computational complexity are analyzed in:

1.2 Key Features

2 Mathematical Background

2.1 The HOIF Framework

Let \(\widehat{\psi}_a^{\mathrm{AIPW}}\) denote the standard first-order doubly robust (AIPW/DML) estimator of \(\psi_a = E[Y(a)]\), and \(\widehat{\mathrm{ATE}}^{\mathrm{AIPW}} = \widehat{\psi}_1^{\mathrm{AIPW}} - \widehat{\psi}_0^{\mathrm{AIPW}}\). Conditional on the estimated nuisance functions, this estimator is biased; HOIF estimates the estimable part of that bias.

For each treatment arm \(a \in \{0,1\}\), the \(m\)-th order HOIF bias-correction term is

\[ \mathbb{HOIF}^a_m(\hat{\Omega}^a) = \sum_{j=2}^m \mathbb{IF}^a_j(\hat{\Omega}^a) = \sum_{i=2}^m \sum_{j=2}^{i} \binom{i-2}{i-j} \mathbb{U}^a_j(\hat{\Omega}^a), \]

and the corresponding correction for the ATE is their difference:

\[ \mathbb{BC}^{\mathrm{ATE}}_m = \mathbb{HOIF}^1_m(\hat{\Omega}^1) - \mathbb{HOIF}^0_m(\hat{\Omega}^0). \]

\(\mathbb{BC}^{\mathrm{ATE}}_m\) is the \(m\)-th order HOIF estimator of the estimable bias of the AIPW estimator of the ATE — it is not itself an estimator of the ATE. The sign convention is such that adding it to the AIPW estimate removes that bias:

\[ \widehat{\mathrm{ATE}}_m = \widehat{\mathrm{ATE}}^{\mathrm{AIPW}} + \mathbb{BC}^{\mathrm{ATE}}_m . \]

In the output of hoif_ate(), the components HOIF1, HOIF0 and ATE correspond to \(\mathbb{HOIF}^1_m\), \(\mathbb{HOIF}^0_m\) and \(\mathbb{BC}^{\mathrm{ATE}}_m\) for \(m = 2, \ldots\) (the component name ATE is kept for backward compatibility — it holds the ATE bias correction, not the ATE itself).

2.2 U-Statistics Formulation

The core computational building block is the U-statistic:

\[ \mathbb{U}^a_m(\hat{\Omega}^a) = (-1)^m \frac{(n-m)!}{n!} \sum_{\substack{(i_1, \ldots, i_m) : \\ i_1 \neq \cdots \neq i_m}} r^a_{i_1} \prod_{s=1}^{m-1} \{Z_{i_s}^\top \hat{\Omega}^a s^a_{i_{s+1}} Z_{i_{s+1}}\} R^a_{i_m} \]

where:

The full algorithmic workflow, mathematical formulas, and all parameters are documented in the PDF shipped with the package (system.file("extdoc", "HOIF.pdf", package = "HOIF"), also available on GitHub).

2.3 Sample Splitting

Conceptually, HOIF estimation involves three separate estimation tasks, and ideally each would use its own, independent part of the data:

  1. The nuisance functions: estimating \(\hat{\mu}(1, X)\), \(\hat{\mu}(0, X)\) and \(\hat{\pi}(X)\);
  2. The inverse weighted Gram matrix \(\hat{\Omega}^a\);
  3. The higher-order U-statistics built from \(\hat{\Omega}^a\).

This package deliberately does not implement task 1: hoif_ate() only takes the predicted values mu1, mu0, pi as inputs, and how (and on which sample) the nuisance functions are estimated is entirely up to the user. Accordingly, the full three-way cross-fitting is not performed by the package; the sample_split argument only controls the split between tasks 2 and 3:

The Quick Start example below follows this structure: the nuisance functions are fitted on one half of the data, and hoif_ate() is run on the other half.

3 Installation

# From CRAN (once accepted):
# install.packages("HOIF")

# Development version from GitHub (also installs the ustats R package):
# install.packages("devtools")
devtools::install_github("cxy0714/HOIF")

3.1 Setting up the Python backend

The default backend computes the higher-order U-statistics in Python via the ustats package. The Python dependencies (u-stats, numpy, torch) are provisioned automatically: with reticulate (>= 1.41) installed, the first call that needs Python (e.g. the first hoif_ate() call) downloads a private Python together with the required packages into a cached environment and reuses it in later sessions. No manual setup is needed in most cases:

library(HOIF)
# First call provisions the Python environment automatically:
# results <- hoif_ate(...)

If you prefer a persistent, dedicated environment, or want to control how PyTorch is installed (CPU-only build vs. CUDA), use the helpers from ustats:

ustats::setup_ustats()             # CPU-only PyTorch (small download)
ustats::setup_ustats(gpu = TRUE)   # default PyPI PyTorch (CUDA on Linux)
ustats::check_ustats_setup()       # verify the environment

You can also point reticulate to your own Python/conda environment (with u-stats installed via pip install u-stats) before Python initializes:

reticulate::use_condaenv("your_env_name", required = TRUE)
# or set the RETICULATE_PYTHON environment variable

See vignette("ustats", package = "ustats") for a complete installation guide and troubleshooting tips.

3.2 No Python at all?

Set pure_R_code = TRUE in hoif_ate(): the U-statistics are then computed with a pure R implementation (orders up to 6) and no Python runtime is required.

4 Quick Start Example

Let’s demonstrate the recommended workflow with a simulated example, following the three-task structure described in the Sample Splitting section above: the nuisance functions are fitted on one half of the data (the nuisance sample), hoif_ate() is run on the other half (the estimation sample), and within the estimation sample we compute both the eHOIF and the sHOIF estimators.

To make the role of the HOIF correction visible, the nuisance models are deliberately misspecified (they use only 3 of the 10 confounders), so the first-order AIPW estimator carries a clear bias — which the HOIF terms then estimate and remove.

4.1 Generate Simulated Data

library(HOIF)

set.seed(123)
n <- 2000
p <- 10

# Covariates (all of them are confounders)
X <- matrix(rnorm(n * p), ncol = p)

# True propensity score and outcome regressions: linear, loading on
# ALL covariates
beta_pi <- c(0.3, -0.2, 0.2, rep(0.25, 7))
beta1   <- c(0.5,  0.4, 0.3, rep(0.4, 7))
beta0   <- c(0.3,  0.2, 0.1, rep(0.3, 7))

true_pi <- plogis(as.vector(X %*% beta_pi))
A <- rbinom(n, 1, true_pi)

mu1_true <- as.vector(1 + X %*% beta1)
mu0_true <- as.vector(X %*% beta0)
Y <- A * mu1_true + (1 - A) * mu0_true + rnorm(n, 0, 0.2)

# True targets
psi1_true <- mean(mu1_true)
psi0_true <- mean(mu0_true)
true_ate  <- psi1_true - psi0_true
cat("True E[Y(1)]:", round(psi1_true, 4), "\n")
#> True E[Y(1)]: 0.9799
cat("True E[Y(0)]:", round(psi0_true, 4), "\n")
#> True E[Y(0)]: -0.016
cat("True ATE:    ", round(true_ate, 4), "\n")
#> True ATE:     0.9958

4.2 Split the Sample

The nuisance sample is used only to fit the nuisance functions; the estimation sample is used only by hoif_ate(). This keeps the nuisance predictions independent of the sample on which the HOIF estimator is computed.

idx_nuis <- sample(n, n / 2)
idx_est  <- setdiff(seq_len(n), idx_nuis)

X_nuis <- X[idx_nuis, , drop = FALSE]
A_nuis <- A[idx_nuis]
Y_nuis <- Y[idx_nuis]

X_est <- X[idx_est, , drop = FALSE]
A_est <- A[idx_est]
Y_est <- Y[idx_est]

4.3 Estimate (Misspecified) Nuisance Functions on the Nuisance Sample

The package only consumes the predicted values of the nuisance functions; in practice, any flexible machine learning method can be used to produce them. Here, to mimic misspecification, the working models use only the first 3 covariates and ignore the remaining confounders:

S <- 1:3  # the working models ignore covariates 4..10

# Outcome regressions, fitted on the nuisance sample only
fit_mu1 <- lm(Y_nuis ~ X_nuis[, S], subset = A_nuis == 1)
fit_mu0 <- lm(Y_nuis ~ X_nuis[, S], subset = A_nuis == 0)

# Propensity score, fitted on the nuisance sample only
fit_pi <- glm(A_nuis ~ X_nuis[, S], family = binomial)

# Predict all nuisance functions on the estimation sample
mu1_hat <- as.vector(cbind(1, X_est[, S]) %*% coef(fit_mu1))
mu0_hat <- as.vector(cbind(1, X_est[, S]) %*% coef(fit_mu0))
pi_hat  <- as.vector(plogis(cbind(1, X_est[, S]) %*% coef(fit_pi)))

# Ensure propensity scores are bounded away from 0 and 1
pi_hat <- pmax(pmin(pi_hat, 0.95), 0.05)

4.4 The First-Order AIPW Estimator and its Bias

The HOIF terms estimate the estimable bias of the standard first-order doubly robust (AIPW/DML) estimator, so let us compute that estimator first, together with its actual error:

psi1_aipw <- mean(mu1_hat + A_est / pi_hat * (Y_est - mu1_hat))
psi0_aipw <- mean(mu0_hat + (1 - A_est) / (1 - pi_hat) * (Y_est - mu0_hat))
ate_aipw  <- psi1_aipw - psi0_aipw

cat("AIPW estimates: E[Y(1)] =", round(psi1_aipw, 4),
    "  E[Y(0)] =", round(psi0_aipw, 4),
    "  ATE =", round(ate_aipw, 4), "\n")
#> AIPW estimates: E[Y(1)] = 1.1749   E[Y(0)] = -0.2676   ATE = 1.4425
cat("AIPW errors:    E[Y(1)] =", round(psi1_aipw - psi1_true, 4),
    "  E[Y(0)] =", round(psi0_aipw - psi0_true, 4),
    "  ATE =", round(ate_aipw - true_ate, 4), "\n")
#> AIPW errors:    E[Y(1)] = 0.195   E[Y(0)] = -0.2516   ATE = 0.4466

Because of the misspecified nuisance models, the AIPW estimator misses the true ATE by about 0.45 — a bias that no amount of data will remove. We now estimate this bias with HOIF.

4.5 Compute the eHOIF Estimator (with sample splitting)

Within the estimation sample, hoif_ate() cross-fits the inverse weighted Gram matrix against the U-statistics over n_folds folds:

results_ehoif <- hoif_ate(
  X = X_est,
  A = A_est,
  Y = Y_est,
  mu1 = mu1_hat,
  mu0 = mu0_hat,
  pi = pi_hat,
  transform_method = "none",  # Use raw covariates
  inverse_method = "direct",
  m = 7,                      # Compute up to 7th order
  sample_split = TRUE,
  n_folds = 2,
  seed = 42,
  backend = "torch"           # Use Python backend
)

print(results_ehoif)
#> HOIF Estimators for Average Treatment Effect
#> =============================================
#>
#> Higher-order correction terms by order:
#>   Order     ATE   HOIF1  HOIF0
#> 1     2 -0.4621 -0.2615 0.2006
#> 2     3 -0.4050 -0.2341 0.1709
#> 3     4 -0.4335 -0.2480 0.1855
#> 4     5 -0.4272 -0.2443 0.1829
#> 5     6 -0.4289 -0.2456 0.1833
#> 6     7 -0.4293 -0.2455 0.1838
#>
#> Estimated AIPW bias correction for the ATE (highest order): -0.4293
#> (add this value to the first-order AIPW/DR estimate of the ATE to debias it)

4.6 Compute the sHOIF Estimator (without sample splitting)

Still using the independent nuisance predictions from above; only the split between the Gram matrix and the U-statistics is dropped:

results_shoif <- hoif_ate(
  X = X_est,
  A = A_est,
  Y = Y_est,
  mu1 = mu1_hat,
  mu0 = mu0_hat,
  pi = pi_hat,
  transform_method = "none",
  inverse_method = "direct",
  m = 7,
  sample_split = FALSE,
  backend = "torch"
)

print(results_shoif)
#> HOIF Estimators for Average Treatment Effect
#> =============================================
#>
#> Higher-order correction terms by order:
#>   Order     ATE   HOIF1  HOIF0
#> 1     2 -0.4318 -0.2496 0.1821
#> 2     3 -0.4458 -0.2568 0.1890
#> 3     4 -0.4373 -0.2520 0.1853
#> 4     5 -0.4362 -0.2515 0.1848
#> 5     6 -0.4366 -0.2517 0.1849
#> 6     7 -0.4367 -0.2517 0.1849
#>
#> Estimated AIPW bias correction for the ATE (highest order): -0.4367
#> (add this value to the first-order AIPW/DR estimate of the ATE to debias it)

4.7 Debias the AIPW Estimator

The reported HOIF1/HOIF0/ATE values are the higher-order influence function terms of orders 2 to \(m\) — they are not by themselves point estimates of \(E[Y(1)]\), \(E[Y(0)]\) or the ATE. They estimate the estimable bias of the AIPW estimator, with the sign convention that adding them to the AIPW estimates removes it:

psi1_ehoif <- psi1_aipw + tail(results_ehoif$HOIF1, 1)
psi0_ehoif <- psi0_aipw + tail(results_ehoif$HOIF0, 1)
psi1_shoif <- psi1_aipw + tail(results_shoif$HOIF1, 1)
psi0_shoif <- psi0_aipw + tail(results_shoif$HOIF0, 1)

comparison <- data.frame(
  row.names = c("E[Y(1)]", "E[Y(0)]", "ATE"),
  Truth = c(psi1_true, psi0_true, true_ate),
  AIPW  = c(psi1_aipw, psi0_aipw, ate_aipw),
  eHOIF = c(psi1_ehoif, psi0_ehoif, psi1_ehoif - psi0_ehoif),
  sHOIF = c(psi1_shoif, psi0_shoif, psi1_shoif - psi0_shoif)
)
round(comparison, 4)
#>           Truth    AIPW   eHOIF   sHOIF
#> E[Y(1)]  0.9799  1.1749  0.9294  0.9232
#> E[Y(0)] -0.0160 -0.2676 -0.0838 -0.0826
#> ATE      0.9958  1.4425  1.0132  1.0058

# Errors relative to the truth
round(sweep(comparison[, -1], 1, comparison$Truth, "-"), 4)
#>            AIPW   eHOIF   sHOIF
#> E[Y(1)]  0.1950 -0.0505 -0.0567
#> E[Y(0)] -0.2516 -0.0678 -0.0666
#> ATE      0.4466  0.0174  0.0100

The HOIF correction removes essentially all of the AIPW bias for the ATE (error \(0.4466 \rightarrow\) about \(0.01\)) and the bulk of it for each treatment arm.

4.8 Visualize Convergence

plot(results_ehoif)

5 Main Function: hoif_ate()

The primary interface for computing HOIF estimators.

5.1 Arguments

5.2 Return Value

A list of class "hoif_ate" containing:

6 Advanced Usage

6.1 Using Basis Expansion

For more flexible modeling of covariate relationships:

# B-splines basis
results_splines <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  transform_method = "splines",
  basis_dim = 5,        # 5 basis functions per covariate
  degree = 3,           # Cubic splines
  m = 5,
  sample_split = FALSE
)

# Fourier basis
results_fourier <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  transform_method = "fourier",
  basis_dim = 4,        # 4 Fourier components per covariate
  period = 1,
  m = 5,
  sample_split = FALSE
)

6.2 Sample Splitting (Cross-Fitting)

To cross-fit the estimation of the inverse weighted Gram matrix against the computation of the U-statistics — the eHOIF estimator (see the Sample Splitting section under Mathematical Background for what exactly is split):

results_cf <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  transform_method = "none",
  m = 5,
  sample_split = TRUE,
  n_folds = 5,          # 5-fold cross-fitting
  seed = 42
)

6.3 Regularized Gram Matrix Inversion

For high-dimensional settings or when facing numerical instability:

# Nonlinear shrinkage
results_nlshrink <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  inverse_method = "nlshrink",
  m = 5
)

# corpcor shrinkage
results_corpcor <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  inverse_method = "corpcor",
  m = 5
)

6.4 Pure R Backend

If the Python environment is not available or for smaller problems:

results_pure_r <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  pure_R_code = TRUE,
  m = 6  # Maximum 6 for pure R
)

7 Computational Details

7.1 Python Backend (ustats)

The default backend uses the ustats R package, an interface to the Python package u-stats, for efficient U-statistic computation:

For HOIF estimators of the ATE, a key takeaway from Chen, Zhang & Liu (2025) is:

where \(n\) is the sample size and \(k\) is the dimension of the transformed covariates.

7.2 Pure R Implementation

The pure R fallback (pure_R_code = TRUE):

7.3 Performance Considerations

8 Practical Recommendations

8.1 Choosing the Transformation Method

8.2 Choosing the Order

For orders \(m \le 7\) the computational complexity does not exceed that of ordinary matrix computations, \(O(n^3 + nk^2 + k^3 + n^2 k)\) (Chen, Zhang & Liu, 2025), so going up to order 7 is cheap. A practical default is therefore to compute all orders up to \(m = 7\) (the package default) and inspect the convergence plot. Beyond order 7 the complexity exceeds \(O(n^4 + nk^2 + k^3 + n^2 k)\), so expect substantially longer run times.

8.3 Use a GPU if Available

With backend = "torch" and a CUDA-enabled PyTorch (e.g. installed via ustats::setup_ustats(gpu = TRUE)), the U-statistics are computed on the GPU automatically — by far the most effective speedup for large \(n\) or \(k\). On a GPU the default precision is float32; pass dtype = "float64" if you need maximum numerical precision.

8.4 Sample Splitting

See the Sample Splitting section under Mathematical Background: the package only cross-fits the inverse weighted Gram matrix against the U-statistics (sample_split = TRUE, eHOIF). The nuisance functions are inputs — estimating them, ideally on a separate, independent sample, is the user’s responsibility (as demonstrated in the Quick Start example).

9 Troubleshooting

9.1 Python Backend Issues

Verify the Python environment used by ustats:

ustats::check_ustats_setup()

If the environment cannot be repaired easily, switch to the pure R implementation:

results <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  pure_R_code = TRUE,
  m = 6  # Limited to order 6
)

9.2 Numerical Instability

If you see warnings about matrix inversion:

# Try shrinkage methods
results <- hoif_ate(
  X = X_est, A = A_est, Y = Y_est,
  mu1 = mu1_hat, mu0 = mu0_hat, pi = pi_hat,
  inverse_method = "nlshrink"  # or "corpcor"
)

9.3 Extreme Propensity Scores

Always trim propensity scores to avoid numerical issues:

# Trim to [0.05, 0.95]
pi_hat <- pmax(pmin(pi_hat, 0.95), 0.05)

10 References

  1. Robins, J. M., Li, L., Tchetgen Tchetgen, E., & van der Vaart, A. (2008). Higher order influence functions and minimax estimation of nonlinear functionals. Probability and Statistics: Essays in Honor of David A. Freedman, 335-421. doi:10.1214/193940307000000527

  2. Robins, J. M., Li, L., Mukherjee, R., Tchetgen Tchetgen, E., & van der Vaart, A. (2017). Minimax estimation of a functional on a structured high-dimensional model. Annals of Statistics, 45(5), 1951-1987.

  3. Liu, L., Mukherjee, R., Newey, W. K., & Robins, J. M. (2017). Semiparametric efficient empirical higher order influence function estimators. arXiv preprint doi:10.48550/arXiv.1705.07577

  4. Liu, L., & Li, C. (2023). New √n-consistent, numerically stable higher-order influence function estimators. arXiv preprint doi:10.48550/arXiv.2302.08097

  5. Chen, X., Zhang, R., & Liu, L. (2025). On computing and the complexity of computing higher-order U-statistics, exactly. arXiv preprint doi:10.48550/arXiv.2508.12627