fastdid
is magnitudes faster than did, and 15x faster than
the fastest alternative DiDforBigData (dfbd
for short) for large dataset. fastdid also uses less memory. Here is a
comparison of run time for fastdid, did, and dfbd using a panel of 10
periods and varying samples sizes.
Unfortunately, the Author’s computer fails to run did at 1 million units. For a rough idea, DiDforBigData is about 100x faster than did in Bradley Setzler’s benchmark. Other staggered DiD implementations are even slower than did.
For memory:
For the benchmark, a baseline group-time ATT is estimated with no
covariates control, no bootstrap, and no explicit parallelization.
Computing time is measured by microbenchmark
and peak RAM
by peakRAM
.
Before each release, we conduct tests to ensure the validity of
estimates from fastdid
.
did
For features included in CS, fastdid
maintains a maximum
of 1% difference from results from the did
package. This
margin of error is mostly for bootstrapped results due to its inherent
randomess. For point estimates, the difference is smaller than 1e-12,
and is most likely the result of floating-point
error. The relevant test files are compare_est.R.
For features not included in CS, fastdid
maintains that
the 95% confidence intervals have a coverage rate between 94% and
96%.
The coverage rate is calculated by running 200 iterations. In each iteration, we test whether the confidence interval estimated covers the group-truth values. We then average the rate across iterations. Due to the randomness of coverage, the realized coverage fall outside of the thresholds in about 1% of the time. The relevant test file is coverage.R.
As an attempt to balance the validity and flexibility of
fastdid
, “experimental features” is introduced in version
0.9.4. These features will be less tested and documented, and it is
generally advised to not use them unless the user know what they and the
package are doing. These experimental features can be accessed via the
exper
argument. For example, to use the
filtervar
feature, call
fastdid(..., exper = list(filtervar = "FF"))
.
The current list of experimental features are
max_control_cohort_diff
: limit the max cohort
difference between treated and control groupfiltervar
, filtervar_post
: limit the units
being used as treated and control group with a potentially-time-varying
variable in base (post) periodonly_balance_2by2
: only require observations to have
non-NA values within each 2 by 2 DiD, instead of throughout all time
periods. Can be an alternative way of dealing with unbalanced panel by
filling the missing periods with NAs. Not recommended as CS only have
allow_unbalance_panel
, which uses a repeated cross-section
2 by 2 DiD estimator.custom_scheme
: aggregate to user-defined
parametersdid
As the name suggests, fastdid’s goal is to be fast did. Besides performance, here are some comparisons between the two packages.
fastdid’s estimators is identical to
did’s. As the performance gains mostly come from
efficient data manipulation, the key estimation implementations are
analogous. For example, 2x2 DiD (estimate_did.R
and
DRDID::std_ipw_did_panel
), influence function from weights
(aggregate_gt.R/get_weight_influence
,
compute.aggte.R/wif
), and multiplier bootstrap
(get_se.R
and mboot.R
).
fastdid should feel similar to att_gt
.
But there are a few differences:
Control group option:
fastdid | did | control group used |
---|---|---|
both | notyettreated | never-treated + not-yet-but-eventually-treated |
never | nevertreated | never-treated |
notyet | not-yet-but-eventually-treated |
Aggregated parameters: fastdid
aggregates in the same
function.
fastdid | did |
---|---|
group_time | no aggregation |
dynamic | dynamic |
time | calendar |
group | group |
simple | simple |