Resource | GitHub Actions | Code Coverage |
---|---|---|
Platforms | Windows, macOS, Linux | codecov |
R CMD check |
In health and social sciences, it is critically important to identify subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect (ATE). The bulk of heterogeneous treatment effect (HTE) literature focuses on two major tasks: (i) estimating HTEs by examining the conditional average treatment effect (CATE); (ii) discovering subgroups of a population characterized by HTE.
Several methodologies have been proposed for both tasks, but providing interpretability in the results is still an open challenge. Bargagli-Stoffi et al. (2023) proposed Causal Rule Ensemble, a new method for HTE characterization in terms of decision rules, via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing stability in the discovery. CRE is an R Package providing a flexible implementation of the Causal Rule Ensemble algorithm.
Installing from CRAN.
install.packages("CRE")
Installing the latest developing version.
library(devtools)
install_github("NSAPH-Software/CRE", ref="develop")
Import.
library("CRE")
Data (required)
y
The observed response/outcome vector
(binary or continuous). z
The
treatment/exposure/policy vector (binary).
X
The covariate matrix (binary or
continuous).
Parameters (not required)
method_parameters
The list of parameters
to define the models used, including: -
ratio_dis
The ratio of data delegated to
the discovery sub-sample (default: 0.5). -
ite_method_dis
The method to estimate the
individual treatment effect (ITE) on the discovery sub-sample (default:
‘aipw’) [1].
- ps_method_dis
The estimation model for
the propensity score on the discovery sub-sample (default:
‘SL.xgboost’).
- or_method_dis
The estimation model for
the outcome regressions estimate_ite_aipw on the discovery sub-sample
(default: ‘SL.xgboost’).
- ite_method_inf
The method to estimate
the individual treatment effect (ITE) on the inference sub-sample
(default: ‘aipw’) [1].
- ps_method_inf
The estimation model for
the propensity score on the inference subsample (default:
‘SL.xgboost’).
- or_method_inf
The estimation model for
the outcome regressions in estimate_ite_aipw on the inference subsample
(default: ‘SL.xgboost’).
hyper_params
The list of hyper
parameters to finetune the method, including: -
intervention_vars
Intervention-able
variables used for Rules Generation (default: NULL).
- offset
Name of the covariate to use as
offset (i.e. ‘x1’) for T-Poisson ITE Estimation. NULL if not used
(default: NULL). - ntrees_rf
A number of
decision trees for random forest (default: 20).
- ntrees_gbm
A number of decision trees
for the generalized boosted regression modeling algorithm. (default:
20).
- node_size
Minimum size of the trees’
terminal nodes (default: 20). - max_nodes
Maximum number of terminal nodes per tree (default: 5).
- max_depth
Maximum rules length (default:
3).
- replace
Boolean variable for replacement
in bootstrapping (default: TRUE).
- t_decay
The decay threshold for rules
pruning (default: 0.025).
- t_ext
The threshold to define too
generic or too specific (extreme) rules (default: 0.01).
- t_corr
The threshold to define
correlated rules (default: 1). - t_pvalue
The threshold to define statistically significant rules (default: 0.05).
- stability_selection
Whether or not using
stability selection for selecting the causal rules (default: TRUE). -
cutoff
Threshold defining the minimum
cutoff value for the stability scores (default: 0.9). -
pfer
Upper bound for the per-family error
rate (tolerated amount of falsely selected rules) (default: 1). -
penalty_rl
Order of penalty for rules
length during LASSO for Causal Rules Discovery (i.e. 0: no penalty, 1:
rules_length, 2: rules_length^2) (default: 1).
Additional Estimates (not required)
ite
The estimated ITE vector. If given,
both the ITE estimation steps in Discovery and Inference are skipped
(default: NULL).
[1] Options for the ITE estimation are as follows: -
S-Learner (slearner
) - T-Learner (tlearner
) -
T-Poisson (tpoisson
) - X-Learner (xlearner
) -
Augmented Inverse Probability Weighting (aipw
) - Causal
Forests (cf
) - Bayesian Causal Forests (bcf
) -
Causal Bayesian Additive Regression Trees (bart
)
if other estimates of the ITE are provided in ite
additional argument, both the ITE estimations in discovery and inference
are skipped and those values estimates are used instead.
Example 1 (default parameters)
set.seed(9687)
<- generate_cre_dataset(n = 1000,
dataset rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
<- dataset[["y"]]
y <- dataset[["z"]]
z <- dataset[["X"]]
X
<- cre(y, z, X)
cre_results summary(cre_results)
plot(cre_results)
Example 2 (personalized ite estimation)
set.seed(9687)
<- generate_cre_dataset(n = 1000,
dataset rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
<- dataset[["y"]]
y <- dataset[["z"]]
z <- dataset[["X"]]
X
<- ... # personalized ite estimation
ite_pred <- cre(y, z, X, ite = ite_pred)
cre_results summary(cre_results)
plot(cre_results)
Example 3 (setting parameters)
set.seed(9687)
<- generate_cre_dataset(n = 1000,
dataset rho = 0,
n_rules = 2,
p = 10,
effect_size = 2,
binary_covariates = TRUE,
binary_outcome = FALSE,
confounding = "no")
<- dataset[["y"]]
y <- dataset[["z"]]
z <- dataset[["X"]]
X
= list(ratio_dis = 0.25,
method_params ite_method_dis="aipw",
ps_method_dis = "SL.xgboost",
oreg_method_dis = "SL.xgboost",
ite_method_inf = "aipw",
ps_method_inf = "SL.xgboost",
oreg_method_inf = "SL.xgboost")
= list(intervention_vars = c("x1","x2","x3","x4"),
hyper_params offset = NULL,
ntrees_rf = 20,
ntrees_gbm = 20,
node_size = 20,
max_nodes = 5,
max_depth = 3,
t_decay = 0.025
t_ext = 0.025,
t_corr = 1,
t_pvalue = 0.05,
replace = FALSE,
stability_selection = TRUE,
cutoff = 0.8,
pfer = 0.1,
penalty_rl = 1)
<- cre(y, z, X, method_params, hyper_params)
cre_results summary(cre_results)
plot(cre_results)
More synthetic data sets can be generated using
generate_cre_dataset()
.
Discovery.
/functional_tests/experiments/discovery.R CRE
Estimation.
/functional_tests/experiments/estimation.R CRE
More exhaustive simulation studies and real world experiment of CRE package can be found at https://github.com/NSAPH-Projects/cre_applications.
Please note that the CRE project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Causal Rule Ensemble (methodological paper)
@article{bargagli2023causal,
title={{Causal rule ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects}},
author={Bargagli-Stoffi, Falco J and Cadei, Riccardo and Lee, Kwonsang and Dominici, Francesca},
journal={arXiv preprint arXiv:2009.09036},
year={2023}
}
CRE (software paper)
@article{cadei2023CRE,
title = {CRE: an R package for Interpretable Discovery and Estimation of Heterogeneous Treatment Effect},
author = {Cadei, Riccardo and Khoshnevis, Naeem and Bargagli-Stoffi, Falco J and Lee, Kwonsang and Garcia, Daniela Maria},
year = {2023},
journal={Working paper},
url = {},
}
CRE (CRAN package)
@Manual{khoshnevis2023CRE,
title = {CRE: Interpretable Subgroups Identification Through Ensemble Learning of Causal Rules},
author = {Khoshnevis, Naeem and Garcia, Daniela Maria and Cadei, Riccardo and Lee, Kwonsang and Bargagli-Stoffi, Falco J},
year = {2023},
note = {R package version 0.2.0.9000},
url = {https://github.com/NSAPH-Software/CRE},
}