Type: Package
Title: Adaptive Influence-Based Borrowing for Hybrid Control Trials
Version: 0.1.0
Description: Implements the adaptive influence-based borrowing framework proposed by Qinwei Yang, Jingyi Li, Peng Wu, and Shu Yang (2026+) in the paper “Improving Treatment Effect Estimation in Trials through Adaptive Borrowing of External Controls" <doi:10.48550/arXiv.2604.13973> for augmenting Randomized Controlled Trials (RCTs) with External Control (EC) data. This package provides a comprehensive workflow to: (1) quantify the comparability of external control samples using influence scores approximated via the influence function of the M-estimator; (2) construct candidate borrowing subsets and select the optimal subset that minimizes the Mean Squared Error (MSE); and (3) calibrate systematic differences in external outcomes using R-learner methods implemented via Ordinary Least Squares or Kernel Ridge Regression.
License: GPL-3
Encoding: UTF-8
Imports: KRLS, stats
RoxygenNote: 7.3.3
NeedsCompilation: no
Packaged: 2026-04-22 15:07:59 UTC; 14425
Author: Jile Chaoge [aut, cre], Peng Wu [aut], Shu Yang [aut]
Maintainer: Jile Chaoge <chogjill@126.com>
Repository: CRAN
Date/Publication: 2026-04-23 20:20:13 UTC

Calculate Influence Scores for External Controls

Description

This function quantifies the comparability of external control samples by assessing how much each individual external sample perturbs the outcome model fitted on the RCT control data. A smaller influence score indicates that the external sample is more compatible with the RCT controls.

Usage

compute_influences(model, testdata = NULL, type = "observed")

Arguments

model

A fitted glm object representing the outcome model on RCT controls.

testdata

A data.frame containing the external control samples. Must include the outcome variable Y and covariates X matching the model.

type

Character string specifying the type of Hessian matrix: "observed" (default) or "expected".

Details

The influence score is approximated using the influence function of the M-estimator. It measures the standardized change in model parameters if a specific external sample were added to the training set, without the computational cost of refitting.

Value

Vector of influence scores corresponding to each row in testdata.


Estimate ATE using RCT Data Only

Description

This function estimates the ATE using only the provided RCT data. It calculates two estimators: the direct estimator and the AIPW estimator.

Usage

estimate_rct(
  X,
  A,
  Y,
  trim = 0.01,
  outcome_family = gaussian(),
  ps_hat = NULL,
  mu0_hat = NULL,
  mu1_hat = NULL
)

Arguments

X

Covariate matrix.

A

Treatment assignment vector (binary: 0 or 1).

Y

Outcome vector.

trim

Numeric value for trimming propensity scores (default 0.01).

outcome_family

GLM family for outcome models (default gaussian()).

ps_hat

Optional pre-calculated propensity scores.

mu0_hat

Optional pre-calculated outcome predictions E[Y|X,A=0].

mu1_hat

Optional pre-calculated outcome predictions E[Y|X,A=1].

Value

A list containing:

Examples

n <- 200
X <- runif(n, 0, 2)
A <- rbinom(n, size = 1, prob = 1/2)
Y1 <- 3 - 2*X + rnorm(n, sd = 0.2)
Y0 <- 2*X + rnorm(n, sd = 0.2)
Y <- (1 - A)*Y0 + A*Y1
result <- estimate_rct(X, A, Y)
print(result$estimate)
print(result$se)


Estimate ATE for a Selected Data Subset (with GLM support)

Description

Estimate ATE for a Selected Data Subset (with GLM support)

Usage

estimate_selected(
  X,
  A,
  Y,
  reference_value = NULL,
  trim = 0.01,
  outcome_family = gaussian(),
  ps_hat = NULL,
  mu0_hat = NULL,
  mu1_hat = NULL
)

Arguments

X

Covariate matrix.

A

Treatment assignment vector (binary: 0 or 1).

Y

Outcome vector.

reference_value

A value representing the "true" treatment effect or a reference estimate used to calculate bias and MSE. If NULL, MSE is returned as NULL.

trim

Value for trimming propensity scores (default 0.01).

outcome_family

GLM family for outcome models (default gaussian()).

ps_hat

Optional vector of estimated propensity scores P(A=1|X).

mu0_hat

Optional vector of estimated conditional means E[Y|X,A=0].

mu1_hat

Optional vector of estimated conditional means E[Y|X,A=1].

Value

A list containing:

Examples

# Generate RCT data
n <- 100
X_rct <- runif(n, 0, 2)
A_rct <- rbinom(n, size = 1, prob = 1/2)
Y1_rct <- 3 - 2*X_rct + rnorm(n, sd = 0.2)
Y0_rct <- 2*X_rct + rnorm(n, sd = 0.2)
Y_rct <- (1 - A_rct)*Y0_rct + A_rct*Y1_rct

# Generate EC data
n <- 500
X_ec <- runif(n, 0, 2)
A_ec <- rep(0, n)
Y_ec <- rep(NA, n)
Y_ec[1:200] <- 2*X_ec[1:200] + rnorm(200, sd = 0.2)
Y_ec[201:n] <- 3*X_ec[201:n] + rnorm(n-200, sd = 0.2)

# Selected EC data
X_selected <- X_ec[1:200]
A_selected <- A_ec[1:200]
Y_selected <- Y_ec[1:200]

result <- estimate_selected(X = c(X_rct, X_ec),
                            A = c(A_rct, A_ec),
                            Y = c(Y_rct, Y_ec))
print(result$estimate)
print(result$se)

result <- estimate_selected(X = c(X_rct, X_selected),
                            A = c(A_rct, A_selected),
                            Y = c(Y_rct, Y_selected))
print(result$estimate)
print(result$se)


Select Optimal Subset of External Controls based on MSE

Description

This function iterates through a sequence of candidate subset sizes (k), selecting the top-k external controls with the smallest influence scores. For each k, it estimates the treatment effect and calculates the MSE relative to a provided reference value. It returns the results for all k and identifies the optimal k that minimizes MSE.

Usage

find_optimal_k(
  dat_rct,
  dat_ec,
  influences,
  reference_value,
  trim = 0.01,
  k_vector = NULL,
  outcome_family = gaussian()
)

Arguments

dat_rct

A data.frame containing the RCT data.

dat_ec

A data.frame containing the External Control data.

influences

Vector of influence scores for the external controls. The length must match the number of rows in dat_ec.

reference_value

A value representing the "true" treatment effect or a high-quality reference estimate used to calculate bias and MSE.

trim

Value for trimming propensity scores (default 0.01).

k_vector

Optional integer vector specifying the candidate numbers of external controls to borrow. If NULL, a default sequence is generated (from 0 to N_ec, step 50).

outcome_family

GLM family for outcome models (default gaussian()).

Details

Data Structure Requirements: The input data frames (dat_rct and dat_ec) must follow this column order:

The code automatically identifies covariates as the first ncol-2 columns. Therefore, please ensure A and Y are placed at the very end of the data frame (e.g., columns order: X1, X2, ..., Xp, A, Y).

Value

A list containing:


Generate Simulation Data for RCT and External Controls

Description

This function generates synthetic data for a RCT and an EC arm. It is designed to demonstrate the adaptive borrowing framework, creating a scenario where the external controls have a different outcome mechanism (bias) compared to the RCT controls, along with some outliers.

Usage

gen_demo_data(n_rct = 100, n_ec = 200, seed = 123)

Arguments

n_rct

Integer. Sample size of the randomized controlled trial (default 100).

n_ec

Integer. Sample size of the external controls (default 200).

seed

Integer. Random seed for reproducibility (default 123).

Details

Output Format: The returned data frames are formatted to be directly compatible with find_optimal_k:

Value

A list containing:

Examples

sim_data <- gen_demo_data(n_rct = 100, n_ec = 200)
head(sim_data$data_rct)
head(sim_data$data_ec)


Calculate Gradient Per Observation for GLM

Description

Internal function to calculate the gradient of the log-likelihood for each observation.

Usage

glm_gradient(model, newdata = NULL)

Arguments

model

A fitted glm object.

newdata

Optional data.frame for new observations.

Value

A list containing the gradient matrix and other model details.


Calculate Hessian Matrix for GLM

Description

Internal function to calculate the Hessian (Information Matrix) for a fitted GLM using training data.

Usage

glm_hessian(model, type = "observed")

Arguments

model

A fitted glm object.

type

Character string, either "observed" or "expected".

Value

A list containing the Hessian matrix.


Predictions for rlearner_krls objects

Description

Predict estimated treatment effects (tau) for new data using a trained rlearner_krls model.

Usage

## S3 method for class 'rlearner_krls'
predict(object, newx = NULL, ...)

Arguments

object

An object of class rlearner_krls.

newx

Covariate matrix to make predictions on. If NULL, returns predictions on the training data.

...

Additional arguments (currently ignored).

Value

A vector of predicted treatment effects.


Predictions for rlearner_lm

Description

Get estimated tau(x) using the trained rlearner_lm model.

Usage

## S3 method for class 'rlearner_lm'
predict(object, newx = NULL, ...)

Arguments

object

An object of class rlearner_lm.

newx

Covariate matrix to make predictions on. If NULL, returns predictions on the training data.

...

Additional arguments (currently ignored).

Value

Vector of predicted treatment effects.

Examples

n = 200; p = 5
set.seed(123)
x = matrix(rnorm(n*p), n, p)
r = rbinom(n, 1, 0.5)
y = 0.5*x[,1] + 0.8*x[,2] + 1.2*r*x[,1] + rnorm(n, sd=0.5)
rl_fit = rlearner_lm(x, r, y)
new_data = matrix(rnorm(10*5), 10, 5)
predictions = predict(rl_fit, new_data)


R-learner implemented via kernel ridge regression with a Gaussian kernel

Description

Implements the R-learner (Nie and Wager, 2017) using kernel ridge regression (via the KRLS package) for nuisance parameter estimation and the final treatment effect model.

Usage

rlearner_krls(x, r, y, whichkernel = "gaussian", pi_hat = NULL, m_hat = NULL)

Arguments

x

Covariate matrix.

r

Treatment assignment vector (binary: 0 or 1).

y

Outcome vector.

whichkernel

Character string specifying the kernel type (default "gaussian"). Passed to KRLS::krls.

pi_hat

Optional vector of estimated propensity scores E[R|X]. If NULL, estimated using KRLS.

m_hat

Optional vector of estimated conditional means E[Y|X]. If NULL, estimated using KRLS.

Value

An object of class rlearner_krls containing the fitted models and estimates.


R-learner implemented via Ordinary Least Squares (Linear Model)

Description

R-learner, as proposed by Nie and Wager (2017), implemented via standard linear regression (lm). It uses linear models (or logistic regression for propensity scores) to estimate nuisance parameters and the final treatment effect model.

Usage

rlearner_lm(x, r, y, pi_hat = NULL, m_hat = NULL)

Arguments

x

Covariate matrix.

r

Treatment assignment vector (binary: 0 or 1).

y

Outcome vector.

pi_hat

Optional vector of estimated propensity scores E[R|X]. If NULL, estimated using logistic regression.

m_hat

Optional vector of estimated conditional means E[Y|X]. If NULL, estimated using linear regression.

Value

An object of class rlearner_lm containing the fitted models and estimates.

Examples

n = 200; p = 5
set.seed(123)
x = matrix(rnorm(n*p), n, p)
r = rbinom(n, 1, 0.5)
y = 0.5*x[,1] + 0.8*x[,2] + 1.2*r*x[,1] + rnorm(n, sd=0.5)
rl_fit = rlearner_lm(x, r, y)
rl_est = predict(rl_fit, x)


Input Sanitization Helper

Description

Internal function to check input data types and dimensions for consistency. It ensures that treatment indicators and outcomes are numeric and match the dimensions of the covariate matrix.

Usage

sanitize_input(x, r, y)

Arguments

x

Input covariate matrix or data.frame.

r

Treatment assignment vector (binary: 0 or 1).

y

Outcome vector.

Value

A list containing the cleaned x, r, and y.