`mzipmed`

R
package- Brief introduction to counterfactual approach to mediation
- Brief overview of MZIP
`mzipmed_data`

overview`mzip()`

function for analysis with zero-inflated count variables- Mediation with Zero-Inflated Count Outcomes
- Mediation with Zero-Inflated Count Mediators
- Extensions of functions to cases with exposure-mediator interactions
- Conclusion
- References

In this document we explain the main features of the
`mzipmed`

package through examples. This package performs
mediation analysis when either the outcome or mediator are a count
variable with many zeroes, called a zero-inflated count. Specifically,
this package merges the concepts of the counterfactual approach to
mediation and the marginalized zero-inflated Poisson (MZIP) model to
obtain mediation effects when there are excess zeroes. Additional
information on statistical methodology is provided elsewhere (Long et
al. 2014, Vanderweele 2016)

Mediation analysis is used to explore how a variable called a
mediator helps explain the relationship between some exposure and an
outcome. Many popular mediation method such as the difference and
product methods collapse when there are interaction effects or the
outcome is not a continuous variable. The counterfactual approach to
mediation involves finding mediation effects through closed-form
expressions that is extendable to many different classes of models and
can account for interaction effects while minimizing computation speed.
If we let \(Y\) be our outcome, \(M\) be our mediator, \(X\) be our exposure, and **\(C\)** be a vector of covariates. The
counterfactual approach involves fitting two regression models, one for
the outcome and one for the mediator.

\[g(E[Y|X=x,M=m,\textbf{C=c}])=g(E[Y|x,m,c])=\tau_0+\tau_1x+\tau_2m+\mathbf{ \tau_4'c}\] \[h(E[M|X=x,\textbf{C=c}])=h(E[M|x,c])=\theta_0+\theta_1x++\mathbf{ \theta_4'c}\] Where \(g\) and \(h\) are link functions associated with the outcome and mediator model respectively and \(\tau\) and \(\theta\) are the parameter estimates associated with the outcome and mediator model respectively. Using parameter estimates from these equations mediation effects can be estimated including the total effect (TE) or the overall effect of the exposure on the outcome, natural direct effect (NDE) or the effect of the exposure on the outcome not operating through the mediator, and natural indirect effect (NIE) or the effect operating through the mediator. Basic formulas for these equations are as follows \[NDE=\sum_m(E[Y|x,m,c]-E[Y|x^*,m,c])P(m|x^*,c)\] \[NIE=\sum_m(E[Y|x,m,c])(P(m|x,c)-P(m|x^*,c))\] Where \(x^*\) is a fixed value of the exposure to be compared to \(x\). The total effect can then be computed as the sum of NDE and NIE on a difference scale and the product of the NDE and NIE if the outcome is on a ratio scale.

Standard count regression models, such as Poisson, give biased parameter estimates and popular count models like the zero-inflated Poisson model do not directly provide population average parameter estimates. In response the marginalized zero-inflated Poisson (MZIP) model was proposed. MZIP is a mixture distribution that models the probability of being an excess zero from a Bernoulli distribution with probability \(\psi_i\) and the overall mean of the distribution from a Poisson distribution with mean \(\nu_i\), where \(\nu_i\) is the product of the mean of the non-excess zeroes, \(\mu_i\), and \(1-\psi_i\). MZIP is fit by \[logit(\psi_i)=\mathbf{Z_i'\gamma}\] \[log(\nu_i)=\mathbf{Z_i'\alpha}\] Where \(\alpha\) and \(\gamma\) are \((p\times 1)\) column vectors of parameters associated with the overall population mean and probability of being an excess zero models. \(\mathbf{Z_i}\) is a \((p\times 1)\) vector of covariates for the i-th individual for the excess zero and overall population mean components of MZIP. Exponentiating \(\alpha\) parameters yield incidence density or incidence rate ratio interpretations equivalent to Poisson model interpretations.

`mzipmed_data`

overviewIncluded in this package is a simulated dataset for examples called
`mzipmed_data`

. This dataset includes 500 observations of 10
variables. These variables are

`X`

is a binary exposure simulated from a Bernoulli distribution with p=0.5`C1`

is a covariate that follows a standard normal distribution`C2`

is a second covariate from a ~Beta(2,2)`ziM`

is a zero-inflated count mediator simulated from the predictors`X`

,`C1`

, and`C2`

`lmM`

is a continuous mediator based on the predictors with error term ~N(0,4)`binM`

is a binary mediator based on the predictors simulated with a Bernoulli distribution`lmY`

is a continuous outcome simulated from`ziM`

and the predictors with error term ~N(0,4)`binY`

is a binary outcome simulated from`ziM`

and the predictors with a Bernoulli distribution`ziY1`

is a zero-inflated count outcome simulated from`lmM`

and the predictors`ziY2`

is a another zero-inflated count outcome, but instead uses`binM`

and the predictors

`mzip()`

function for analysis with zero-inflated count
variablesMZIP can be performed with the `mzip()`

function. For
example, assume we would like to observe how our `X`

is
associated with `ziY1`

from `mzipmed_data`

. First
we need to load the `mzipmed`

package

`library(mzipmed)`

Next we can use the `mzip()`

function to conduct this
analysis. First we need to specify the `y`

argument which in
our case `y=mzipmed_data$ziY1`

. For the covariates we specify
the `pred`

argument. For a single predictor we could use
`pred=mzipmed_data$X`

. Suppose we adjust for `C1`

and `C2`

then we bind all the predictors into a vector using
the `cbind()`

function in base R,
e.g. `pred=cbind(mzipmed_data$X,mzipmed_data$C1,mzipmed_data$C2)`

.
If we want our covariates named in the outcome we need to specify within
the `cbind()`

function,
e.g. `pred=cbind(X=mzipmed_data$X,C1=mzipmed_data$C1,C2=mzipmed_data$C2)`

otherwise they will be named X1,X2,… in order they are included. The
final argument is `print=FALSE`

, if `print=TRUE`

parameter estimates, standard errors, p-values, and risk ratios will be
printed into the R console. The default is TRUE. In additional an offset
variable can be specified using the `offset`

argument,
e.g. `offset=variable`

. No offset is included in this
dataset. Using this example

```
=mzip(y=mzipmed_data$ziY1, pred=cbind(X=mzipmed_data$X,C1=mzipmed_data$C1,C2=mzipmed_data$C2), offset=NULL, print=TRUE)
test#> [1] "Overall Mean Estimates from MZIP"
#> Variable Alpha_Estimate SE P_Value Robust_SE Robust_P_Value
#> 1 intercept -0.520 0.200 0.0092 0.237 0.0282
#> 2 X 0.249 0.150 0.0974 0.209 0.2340
#> 3 C1 0.064 0.069 0.3605 0.073 0.3845
#> 4 C2 0.886 0.322 0.0059 0.366 0.0153
#> [1] "Excess Zero Estimates from MZIP"
#> Variable Gamma_Estimate SE P_Value Robust_SE Robust_P_Value
#> 1 intercept 0.515 0.259 0.0469 0.264 0.0508
#> 2 X 0.288 0.199 0.1495 0.205 0.1614
#> 3 C1 0.040 0.085 0.6436 0.081 0.6273
#> 4 C2 -0.095 0.410 0.8172 0.399 0.8124
#> [1] "Overall Mean Incidence Rate Ratio Estimates"
#> Variable incrate_ratio Lower_CI Upper_CI RobLower_CI RobUpper_CI
#> 1 X 1.283 0.956 1.722 0.851 1.934
#> 2 C1 1.066 0.930 1.221 0.923 1.230
#> 3 C2 2.426 1.291 4.558 1.185 4.967
```

We find no evidence that `X`

is significantly associated
with the outcome and we obtain a relative risk estimate of 1.28. The
robust standard errors are often used if over-dispersion is a concern as
they typically provide wider confidence intervals. In the
`test`

output one can also find covariance matrices,
parameter estimates, confidence intervals, etc.

Assume we want to examine how a continuous mediator explains the
relationship between an exposure and a zero-inflated count mediator. We
would fit the following models \[logit(\psi_i|x,m,c)=\gamma_0+\gamma_1x+\gamma_2m+\mathbf{\gamma_4'c}\]
\[log(\nu_i|x,m,c)=\alpha_0+\alpha_1x+\alpha_2m+\mathbf{\alpha_4'c}\]
\[E(M|x,c)=\theta_0+\theta_1x+\mathbf{\theta_4'c}\]
The overall mean outcome model is on a incidence rate ratio (IRR) scale
so NDE and NIE estimates will be as well and can be computed by \[IRR^{NDE}=e^{\alpha_1(x-x^*)}\] \[IRR^{NIE}=e^{\alpha_2\theta_1(x-x^*)}\] To
conduct this mediation analysis in the `mzipmed`

package we
would use the `zioutlmmed()`

function. This functions only
allows for a single outcome, mediator, and exposure at a time. Using the
`mzipmed`

data let \(X=X\)
and \(M=lmM\) and our outcome is
`ziY1`

. We specify these variables with the following
arguments
`outcome=mzipmed_data$ziY1`

,`mediator=mzipmed_data$lmM`

,
`exposure=mzipmed_data$X`

, and for the covariates/confounders
`confounder=cbind(mzipmed_data$C1,mzipmed_data$C2`

). Standard
errors can be computed using bootstrapping which is robust, but
computationally intensive or via delta method for instantaneous variance
estimation. For delta method use `error="Delta"`

which is the
default and for bootstrap use `error="Boot"`

. Robust standard
errors can be requested within the delta method to be used for concerns
of over-dispersion by specifying `robust=TRUE`

; FALSE is the
default. We also need to input values of the exposure to compare to
using the `X`

and `Xstar`

arguments, for a
dummy-coded binary exposure and by default we let `X=1`

and
`Xstar=0`

. These can be changed as needed. For bootstrap
standard error you can specify the number of iterations, by default
there are 1000 repetitions, `n=1000`

. An offset variable for
the ZI count outcome can be specified with the `zioff`

argument using the same syntax as the
`outcome`

,`exposure`

arguments. As an example

```
=zioutlmmed(outcome=mzipmed_data$ziY1,mediator=mzipmed_data$lmM,exposure=mzipmed_data$X, confounder=cbind(mzipmed_data$C1,mzipmed_data$C2), error="Delta", robust=FALSE,X=1,Xstar=0,n=1000, zioff=NULL)
ziout#> Estimate log SE Lower CI Upper CI
#> Natural Direct Effect (IRR) 1.021 0.147 0.766 1.361
#> Natural Indirect Effect (IRR) 1.163 0.068 1.018 1.328
#> Total Effect (IRR) 1.187 0.162 0.865 1.630
#> Proportion Mediated 0.889
```

Printed into the R console are the incidence rate ratios of the NDE,
NIE, and TE along with their standard errors and confidence intervals
and the proportion mediated. From the ziout data we see no evidence of a
significant direct effect of the exposure on the outcome as the
incidence rate ratio includes 1 in the confidence interval, but do see
that a significant effect operating through the mediator since 1 is not
in the confidence interval. Additional information in the ziout in the R
global environment include all information included in the outcome model
using `mzip()`

and the mediator model using `lm()`

from the `stats`

package in base R.

The `mzip`

package also includes a function for mediation
with zero-inflated count outcomes and binary mediators. For this
procedure a MZIP model is fit for the outcome and a logistic regression
is fit for the mediator model using the `glm()`

function in
base R. The aforementioned function is called
`zioutbinmed()`

. The NDE is the same as with continuous
mediators, but the NIE is now conditional on covariates

\[IRR^{NIE}=\frac{([1+e^{\theta_0+\theta_1x^*+\mathbf{\theta_2'c}}][1+e^{\alpha_2
\theta_0+\theta_1x+\mathbf{\theta_2'c}}])}{([1+e^{\theta_0+\theta_1x+\mathbf{\theta_2'c}}][1+e^{\alpha_2\theta_0+\theta_1x^*+\mathbf{\theta_2'c}}])}\]
Therefore fixed values of the covariates are required. The input for the
`zioutbinmed()`

function is the same as in the previous
section, but also includes an argument to specify these covariates
called `C`

. By default, `C=NULL`

each covariate
will be fixed at its mean value to approximate marginal estimates, but
can be changed using for example `C=c(0,2)`

. The remainder of
the input and output for this function is the same as with the a
continuous mediator and we now use
`outcome=mzipmed_data$ziY2`

for the outcome variable.

```
=zioutbinmed(outcome=mzipmed_data$ziY2,mediator=mzipmed_data$binM, exposure=mzipmed_data$X, confounder=cbind(mzipmed_data$C1,mzipmed_data$C2), error="Delta", robust=FALSE,X=1, Xstar=0, n=1000, C=NULL, zioff=NULL)
zioutb#> Estimate log SE Lower CI Upper CI
#> Natural Direct Effect (IRR) 1.267 0.138 0.966 1.661
#> Natural Indirect Effect (IRR) 1.008 0.011 0.987 1.029
#> Total Effect (IRR) 1.277 0.138 0.974 1.674
#> Proportion Mediated 0.037
```

Assume we have a continuous outcome, \(Y\), and a zero-inflated count mediator
\(M\). We fit a linear regression
outcome model and a MZIP mediator model as follows: \[E(Y|x,m,c)=\beta_0+\beta_1x+\beta_2m+\mathbf{\beta_4'c}\]
\[logit(\psi_i(M)|x,c)=\gamma_0+\gamma_1x+\mathbf{\gamma_4'c}\]
\[log(\nu_i(M)|x,c)=\alpha_0+\alpha_1x+\mathbf{\alpha_4'c}\]
Because the outcome is on a difference scale, so will the NDE and NIE
estimates which can be derived by \[\beta_1(x-x^*)\] \[\beta_2[e^{\alpha_0+\alpha_1x+\mathbf{\alpha_4'c}}-e^{\alpha_0+\alpha_1x^*+\mathbf{\alpha_4'c}}]\]
For this mediation analysis the `lmoutzimed()`

function could
be used. For example with the `mzipmed_data`

we assume the
same set of predictors as in the section with ZI outcomes, but now let
`mediator=mzipmed_data$ziM`

and
`outcome=mzipmed_data=lmY`

. Input required is the same as in
the section for ZI outcomes with the `C`

argument shown in
the ZI outcomes with binary mediators example and this function uses the
`lm()`

function for the outcome variable and the
`mzip()`

function for the mediator model with all effects
being on a difference scale.Covariates and confounders can be specified
in the `confounder`

argument.Standard errors can be computed
using delta method, `error="Delta"`

or via bootstrap
`error="Boot"`

. For delta method robust standard errors can
be used with the `robust=TRUE`

argument. For bootstrap the
number of iterations can be specified with the `n`

argument
with 1000 being the default. An offset for the zero-inflated count
mediator model can be specified with the `zioff`

argument.
For example,

```
=lmoutzimed(outcome=mzipmed_data$lmY,mediator=mzipmed_data$ziM, exposure=mzipmed_data$X, confounder=cbind(mzipmed_data$C1,mzipmed_data$C2),error="Delta", robust=FALSE,X=1,Xstar=0,C=NULL, zioff=NULL)
zimed#> Estimate SE Lower CI Upper CI
#> Natural Direct Effect 1.388 0.366 0.670 2.106
#> Natural Indirect Effect 0.072 0.049 -0.025 0.169
#> Total Effect 1.460 0.367 0.741 2.179
#> Proportion Mediated 0.049
```

After running the function we can see in the R console that the NDE
is statistically significant as the mean difference does not include 0
in the confidence interval, but the NIE is not significant because it
includes 0. Additional information on the individual outcome and
mediator models can be found in the `zimed`

data now in the
global environment.

Note: when using this functions extension with an interaction an
additional argument is required if an offset is specified. This argument
called `OFF`

requires a fixed value of the offset variable to
derive effects. If none are specified then the mean of the offset is
used. If no offset is specified then this argument is no needed.

For binary outcomes we do not use a logistic regression outcome model
because of the non-collapsability of logit-links makes deriving closed
form expression of NDE and NIE impossible so in this package we use a
Poisson outcome model with a log-link with robust covariance structure
using the `vcovHC()`

function in the `sandwich`

package. Assuming the same input as in the section for continuous
outcomes, but now `outcome=mzipmed_data$binY`

and NDE and NIE
estimates will now be on a ratio scale because of the Poisson outcome
model. The outcome model will be

\[log[P(Y_i=1|x,m,c)]=\tau_0+\tau_1x+\tau_2m+\mathbf{\tau_4'c}\]

The same MZIP mediator model can be used as shown in the previous section. Risk ratios of NDE and NIE can be derived as displayed below.

\[RR^{NDE}=e^{\tau_1(x-x^*)}\] \[RR^{NIE}=\frac{(1+e^{\gamma_0+\gamma_1x^*+\mathbf{\gamma_4'c}})(e^{\gamma_0+\gamma_1x+\mathbf{\gamma_4'c}}+e^{(e^{\alpha_0+\alpha_1x+\mathbf{\alpha_4'c}+log(1+e^{\gamma_0+\gamma_1x+\mathbf{\gamma_4'c}})})(e^{\tau_2}-1)}}{(1+e^{\gamma_0+\gamma_1x+\mathbf{\gamma_4'c}})(e^{\gamma_0+\gamma_1x^*+\mathbf{\gamma_4'c}}+e^{(e^{\alpha_0+\alpha_1x^*+\mathbf{\alpha_4'c}+log(1+e^{\gamma_0+\gamma_1x^*+\mathbf{\gamma_4'c}})})(e^{\tau_2}-1)}}\]

For this scenario we will use the `binoutzimed()`

function. For example,

```
=binoutzimed(outcome=mzipmed_data$binY,mediator=mzipmed_data$ziM, exposure=mzipmed_data$X, confounder=cbind(mzipmed_data$C1,mzipmed_data$C2),error="Delta", robust=FALSE,X=1,Xstar=0,C=NULL, zioff=NULL, OFF=NULL, rare=FALSE)
zimedb#> Estimate log SE Lower CI Upper CI
#> Natural Direct Effect (RR) 1.325 0.104 1.081 1.624
#> Natural Indirect Effect (RR) 1.011 0.010 0.991 1.032
#> Total Effect (RR) 1.340 0.103 1.094 1.641
#> Proportion Mediated 0.044
```

If the outcome is a rare binary variable then odds ratios will
approximate risk ratios which may provide better fit for the data. To
accommodate this the functions for binary outcomes also include an
additional argument, `rare`

. By default
`rare=FALSE`

the robust Poisson model is used, but if
`rare=TRUE`

a logistic regression is fit for the outcome
model. In addition, if an offset is specified then a fixed value of the
offset variable is required in deriving mediation effects. This can be
done using the `OFF`

argument and should be a numeric value.
If this argument is left `NULL`

then the mean of the offset
variable is used. If no offset is used then this argument is not needed.
The rest of the input and output are all the same as in the section with
continuous outcomes, except effects being on a ratio scale. For the
zimedb example we find a significant NDE as 1 is not included in the
risk ratio confidence interval, but fail to observe a significant
NIE.

The `mzipmed`

package extends all the previous functions
to cases where you are considering if there is an exposure-mediator
interaction. In addition to the NDE and NIE additional quantities can be
computed including the controlled direct effect (CDE), pure indirect
effect (PIE), interactive reference effect (INTref), and interaction
mediation effect (INTmed) which are the exposure-outcome relationship
due to neither mediation nor interaction, only mediation, only
interaction, and the tandem effect of mediation and interaction
respectively. The interaction effects are for additive interactions,
multiplicative interactions can be assessed directly in the outcome
model. Also computed is the proportion eliminated (PE) which answers the
question ‘if we were to fix the mediator to some value how much of the
exposure differences in the outcome could be eliminated?’.

The functions to do this are as follows: for a continuous outcome and
zero-inflated mediator use `lmoutzimedint`

, for a binary
outcome and ZI mediator use `binoutzimedint`

, for a ZI
outcome and continuous mediator use `zioutlmmedint`

, and for
a ZI outcome with a binary mediator use `zioutbinmedint`

. We
will skip model expression and formulas for effects, but they are
available upon request. Instead we will show an example using
`zioutlmmedint`

. These functions have the same input as their
non-interaction counterparts, but also require an additional argument in
`M`

which is a fixed value of the mediator used to compute
CDE, INTref, INTmed, and PE. By default, `M=NULL`

the mean
value of the mediator is used, but can be altered e.g. `M=0`

.
Running the function will print the NDE, NIE, and TE in R console like
before as well as the other 4 effects, the overall additive interaction
effect, and PM, PE, and the proportion attributable to interaction.

```
=zioutlmmedint(outcome=mzipmed_data$ziY1,mediator=mzipmed_data$lmM, exposure=mzipmed_data$X, confounder=cbind(mzipmed_data$C1,mzipmed_data$C2),error="Delta", robust=FALSE,X=1,Xstar=0,M=NULL,C=NULL)
zimout#> Estimate log SE Lower CI Upper CI
#> Natural Direct Effect (IRR) 1.105 0.241 0.689 1.772
#> Natural Indirect Effect (IRR) 1.186 0.079 1.017 1.384
#> Total Effect (IRR) 1.311 0.266 0.778 2.208
#>
#> 4-way Decomposition
#> Controlled Direct Effect (IRR) 0.990 0.148 0.741 1.323
#> Pure Indirect Effect (IRR) 1.145 0.063 1.011 1.296
#> Interactive Reference Effect 0.132 0.053 0.028 0.235
#> Interactive Mediation Effect 0.071 0.073 -0.072 0.213
#> Total Additive Interaction 0.202 0.107 -0.007 0.411
#>
#> Proportions
#> Proportion Mediated 0.662
#> Proportion Eliminated 1.032
#> Proportion from Interaction 0.562
```

From this example we fail to see a significant NDE, but do have a significant mediation effect based on incidence rate ratio interpretations. The interaction effects are on a difference scale regardless of the scale of the outcome variable so because 0 is included in the total additive interaction confidence interval we can conclude that we do not have evidence of a significant additive interaction effect between the exposure and mediator on the outcome. Investigating further into the zimout data in the global environment proportions attributable to each of the decomposed effects are provided. Because the interaction was not significant, it would be recommended to use the function that does not include the interaction term. It is always useful to check for an interaction effect initially.

This package provides functions for conducting mediation analysis for zero-inflated count variables using MZIP. This document serves as a brief overview of the package. For more information feel free to contact the package maintainer directly.

Long DL, Preisser JS, Herring AH, Golin CE. A marginalized zero-inflated Poisson regression model with overall exposure effects. Stat Med. 2014;33(29):5151-5165. doi:10.1002/sim.6293

VanderWeele T. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press; 2016.