Matching and Weighting Multiply Imputed Datasets


One of the major issues in the matching procedures is the presence of missing data on the covariates or outcome indicator since matching requires comparing the values of covariates for units in control and treated subgroups or relies on the predictions from a logistic regression model, and with missing values in the covariates within the model, the comparison or predictions cannot be done for that unit. There are a couple of solutions to address this problem (including the complete-case analysis) and with flaws and limitations in these approaches, adopting algorithms to multiply impute the missing data is growing as a popular alternative.

The mice and Amelia packages are widely accepted statistical tools for imputing the ignorable missing data in the R platform. The MatchThem package simplifies the process of matching the imputed datasets by these packages and enables credible adoption of the two matching approaches (within and across) and several matching methods in practice.


The MatchThem package can be installed from the Comprehensive R Archive Network (CRAN) repository as follows:


The latest (though unstable) version of the package can be installed from GitHub as follows:

devtools::install_github(repo = "FarhadPishgar/MatchThem")

Suggested Workflow

Adopting algorithms to multiply impute the missing data, before the matching procedure, and the matching procedure itself may seem to be complicated tasks. This suggested workflow tries to map out this process into five steps (please see the package cheat sheet for more details):

  1. Imputing the Missing Data in the Dataset: mice and Amelia packages should be used to multiply impute the missing data in the dataset.
  2. Matching the Imputed Datasets: matchthem() from the MatchThem package should be used to select matched units from control and treated subgroups of each imputed dataset.
  3. Assessing Balance on the Matched Datasets: cobalt package should be used to assess the extent of balance for all covariates in the imputed datasets after matching.
  4. Analyzing the Matched Datasets: with() from the MatchThem package should be used to estimate causal effects in each matched dataset.
  5. Pooling the Causal Effect Estimates: pool() from the MatchThem package should be used to pool the obtained causal effect estimates from analyzing each dataset.


The logo for this package, a trip to the Arctic, was designed and kindly provided by Max Josino (please see his website and Dribble to see his works).

We would like to thank the CRAN team members for their technical support and comments on the package performance. This package relies on the MatchIt, mice, and WeightIt packages. Please cite their reference manuals and vignettes in your work besides citing the reference manual and vignette of this package.


Farhad Pishgar

Noah Greifer

Clémence Leyrat

Elizabeth Stuart