CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R
Maintainer: | Ben Bolker, Julia Piaskowski, Emi Tanaka, Phillip Alday, Wolfgang Viechtbauer |
Contact: | bolker at mcmaster.ca |
Version: | 2024-05-08 |
URL: | https://CRAN.R-project.org/view=MixedModels |
Source: | https://github.com/cran-task-views/MixedModels/ |
Contributions: | Suggestions and improvements for this task view are very welcome and can be made through issues or pull requests on GitHub or via e-mail to the maintainer address. For further details see the Contributing guide. |
Citation: | Ben Bolker, Julia Piaskowski, Emi Tanaka, Phillip Alday, Wolfgang Viechtbauer (2024). CRAN Task View: Mixed, Multilevel, and Hierarchical Models in R. Version 2024-05-08. URL https://CRAN.R-project.org/view=MixedModels. |
Installation: | The packages from this task view can be installed automatically using the ctv package. For example, ctv::install.views("MixedModels", coreOnly = TRUE) installs all the core packages or ctv::update.views("MixedModels") installs all packages that are not yet installed and up-to-date. See the CRAN Task View Initiative for more details. |
Contributors: Maintainers plus Michael Agronah, Matthew Fidler, Thierry Onkelinx
Mixed (or mixed-effect) models are a broad class of statistical models used to analyze data where observations can be assigned a priori to discrete groups, and where the parameters describing the differences between groups are treated as random (or latent) variables. They are one category of multilevel, or hierarchical models; longitudinal data are often analyzed in this framework. In econometrics, longitudinal or cross-sectional time series data are often referred to as panel data and are sometimes fitted with mixed models. Mixed models can be fitted in either frequentist or Bayesian frameworks.
This task view only includes models that incorporate continuous (usually although not always Gaussian) latent variables. This excludes packages that handle hidden Markov models, latent Markov models, and finite (discrete) mixture models (some of these are covered by the Cluster task view). Dynamic linear models and other state-space models that do not incorporate a discrete grouping variable are also excluded (some of these are covered by the TimeSeries task view). Bioinformatic applications of mixed models hosted on Bioconductor are mostly excluded as well.
Basic model fitting
Linear mixed models
Linear mixed models (LMMs) make the following assumptions:
- The expected values of the responses are linear combinations of the fixed predictor variables and the random effects.
- The conditional distribution of the responses is Gaussian (equivalently, the errors are Gaussian).
- The random effects are normally distributed.
Frequentist:
The most commonly used packages and/or functions for frequentist LMMs are:
- nlme:
nlme::lme()
provides REML or ML estimation. Allows multiple nested random effects, and provides structures for modeling heteroscedastic and/or correlated errors. Wald estimates of parameter uncertainty.
- lme4:
lmer4::lmer()
provides REML or ML estimation. Allows multiple nested or crossed random effects, can compute profile confidence intervals and conduct parametric bootstrapping.
- mbest: fits large nested LMMs using a fast moment-based approach.
Bayesian:
Most Bayesian R packages use Markov chain Monte Carlo (MCMC) estimation: MCMCglmm, rstanarm, and brms; the latter two packages use the Stan infrastructure. blme, built on lme4, uses maximum a posteriori (MAP) estimation. bamlss provides a flexible set of modular functions for Bayesian regression modeling.
Generalized linear mixed models
Generalized linear mixed models (GLMMs) can be described as hierarchical extensions of generalized linear models (GLMs), or as extensions of LMMs to different response distributions, typically in the exponential family. The random-effect distributions are typically assumed to be Gaussian on the scale of the linear predictor.
Frequentist:
- MASS:
MASS::glmmPQL()
fits via penalized quasi-likelihood.
- lme4:
lme4::glmer()
uses Laplace approximation and adaptive Gauss-Hermite quadrature; fits negative binomial as well as exponential-family models.
- glmmTMB uses Laplace approximation; allows some correlation structures; fits some non-exponential families (Beta, COM-Poisson, etc.) and zero-inflated/hurdle models.
- GLMMadaptive uses adaptive Gauss-Hermite quadrature; fits exponential family, negative binomial, beta, zero-inflated/hurdle/censored Gaussian models, user-specified log-densities.
- hglm fits hierarchical GLMs using h-likelihood (sensu Nelder, Lee and Pawitan (2017)
- glmm fits GLMMs using Monte Carlo likelihood approximation.
- glmmEP fits probit mixed models for binary data by expectation propagation.
- mbest fits large nested GLMMs using a fast moment-based approach.
- galamm fits a wide variety of models (heteroscedastic, mixed response types, factor loadings, etc.)
- glmmrBase uses MCMC and Laplace approximations to Gaussian, binomial, Poisson, Beta, Gamma responses with flexible correlation structures
Bayesian:
Most Bayesian mixed model packages use some form of Markov chain Monte Carlo (or other Monte Carlo methods).
- MCMCglmm: Gibbs sampling. Exponential family, multinomial, ordinal, zero-inflated/altered/hurdle, censored, multimembership, multi-response models. Pedigree (animal/kinship/phylogenetic) models.
- rstanarm Hamiltonian Monte Carlo (based on Stan); designed for
lme4
compatibility.
- brms: Hamilton Monte Carlo. Linear, robust linear, count data, survival, response times, ordinal, zero-inflated/hurdle/censored data.
- bamlss: optimization and derivative-based Metropolis-Hastings/slice sampling. Wide range of distributions and link functions.
The following packages (in addition to bamlss) find maximum a posteriori fits to Bayesian (G)LMMs by optimization:
- blme wraps lme4 to add prior distributions.
- INLA uses integrated nested Laplace approximation to fit GLMMs using a wide range of latent models (especially for spatial estimation), priors, and distributions.
- inlabru facilitates spatial modeling using integrated nested Laplace approximation via the R-INLA package. Additionally, extends the GAM-like model class to more general nonlinear predictor expressions and implements a log-Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data.
- inlatools provides tools to set sensible priors and check the dispersion and distribution of INLA models.
vglmer estimates GLMMs by variational Bayesian methods.
Nonlinear mixed models
Nonlinear mixed models incorporate arbitrary nonlinear responses that cannot be accommodated in the framework of GLMMs. Only a few packages can accommodate generalized nonlinear mixed models (i.e., parametric nonlinear mixed models with non-Gaussian responses). However, many packages allow smooth nonparametric components (see “Additive models” below). Otherwise, users may need to implement GNLMMs themselves in a more general hierarchical modeling framework.
Frequentist:
nlme::nlme()
from nlme and lmer4::nlmer()
from lme4 fit nonlinear mixed effects models by maximum likelihood.
nlmixr2::nlmixr2()
from nlmixr2 fits nonlinear mixed effects model by first order conditional estimation (focei) maximum likelihood approximation (a different approximation than nlme:nlme()
and lmer4:nlmer()
), and allows generalized likelihood as well as a selection of built-in link functions.
gnlmm()
and gnlmm3()
from repeated fit GNLMMs by Gauss-Hermite integration.
- saemix and nlmixr2 both use a stochastic approximation of the EM algorithm to fit a wide range of GNLMMs.
Bayesian:
Generalized estimating equations
General estimating equations (GEEs) are an alternative approach to fitting clustered, longitudinal, or otherwise correlated data. These models produce estimates of the marginal effects (averaged across the group-level variation) rather than conditional effects (conditioned on group-level information).
- geepack, gee, and geeM are standard GEE solvers, providing GEE estimation of the parameters in mean structures with possible correlation between the outcomes.
- geesmv: GEE estimator using the original sandwich variance estimator proposed by Liang and Zeger (1986), and eight types of variance estimators for improving the finite small-sample performance.
- multgee is a GEE solver for correlated nominal or ordinal multinomial responses.
- glmtoolbox handles a wide variety of model types (GLMs, beta-binomial and negative binomial, zero-inflation and zero-alteration, mixed models) via GEEs
Specialized models/tasks
- Additive models (models incorporating smooth functional components such as regression splines or Gaussian processes; also known as semiparametric models): gamm4, mgcv, brms, lmeSplines, bamlss, gamlss, LMMsolver, R2BayesX, GLMMRR, glmmTMB, galamm.
- Big data/distributed computation: lmmpar, mbest. See also MixedModels.jl (Julia), diamond (Python).
- Bioinformatics/quantitative genetics: MCMC.qpcr, QGglmm, CpGassoc (methylation studies).
- Censored data (response data known only up to lower/upper bounds): brms and nlmixr2 (general), ARpLMEC (censored Gaussian, autoregressive errors). Censored Gaussian (Tobit) responses: GLMMadaptive, MCMCglmm, gamlss.
- Denominator degree-of-freedom computation: Satterthwaite and/or Kenward-Roger corrections are computed by lmerTest, pbkrtest, glmmrBase
- Differential equations (fitting DEs with group-structured parameters; this category overlaps considerably with pharmacokinetic modeling): mixedsde for stochastic DEs. Ordinary DEs can be run with nlmixr2 using the “focei” or “saem” (EM) methods, or using the nlme package; see also the DifferentialEquations and Pharmacokinetics task views.
- Doubly hierarchical GLMs: dhglm, mdhglm (multivariate)
- Factor analytic, latent variable, and structural equation models: lavaan, nlmm,sem, piecewiseSEM, semtree, and blavaan; see also the Psychometrics task view.
- Flexible correlation structures: brms, glmmTMB, sommer, glmmrBase, regress
- Kinship-augmented models (responses where individuals have a known family relationship): pedigreemm, coxme, kinship2, LMMsolver, MCMCglmm, sommer, rrBLUP, BGLR, lme4GS, lme4qtl, cpgen, QTLRel.
- Location-scale models: nlme, glmmTMB, brms, mgcv [with
family
chosen from one of the *ls
/*lss
options] all allow modeling of the dispersion/scale component.
- Missing values: mice, CRTgeeDR, JointAI, mdmb, pan; see also the MissingData task view.
- Multiple membership models: (Bayesian) MCMCglmm, brms, rmm; (frequentist) lmerMultiMember (can also fit the Bradley-Terry model)
- Multinomial responses: bamlss, R2BayesX, MCMCglmm, mgcv, mclogit.
- Multivariate responses/multi-trait analysis: (multiple dependent variables; the response variables may or may not be constrained to be from the same family) MCMCglmm, MegaLMM, brms, sommer, INLA. Many mixed-effect packages allow fitting of (homogeneous) multivariate responses by “melting” the data (converting to long format) and treating each observation in the original data as a cluster.
- Non-Gaussian random effects: brms, repeated, spaMM.
- Ordinal-valued responses (responses measured on an ordinal scale): ordinal, GLMMadaptive, multgee (frequentist); MCMCglmm, brms (Bayesian), cplm (both)
- Over-dispersed models: aod, aods3.
- Panel data: in econometrics, panel data typically refers to subjects (individuals or firms) that are sampled repeatedly over time. The theoretical and computational approaches used by econometricians overlap with mixed models (e.g., see here). The plm package can fit mixed-effects panel models; see also the Econometrics task view.
- Quantile regression: lqmm, qrLMM, qrNLMM.
- Phylogenetic models: pez, phyr, MCMCglmm, brms.
- Repeated measures: (packages with specialized covariance structures for handling repeated measures) nlme, mmrm, glmmTMB, LMMsolver, repeated, mmrm
- Regularized/penalized models (regularization or variable selection by ridge, lasso, or elastic net penalties): splmm fits LMMs for high-dimensional data by imposing penalty on both the fixed effects and random effects for variable selection. glmmLasso fits GLMMs with L1-penalized (LASSO) fixed effects. bamlss implements LASSO-like penalization for generalized additive models.
- Robust/heavy-tailed estimation (downweighting the importance of extreme observations): robustlmm, robustBLME (Bayesian robust LME), CRTgeeDR for the doubly robust inverse probability weighted augmented GEE estimator. Some packages (brms, bamlss, GLMMadaptive, glmmTMB, mgcv with
family = "scat"
, nlmixr2) allow heavy-tailed response distributions such as Student-t.
- Skewed data/response transformation: skewlmm fits a scale mixture of skew-normal linear mixed models using expectation-maximization (EM). nlmixr2 can fit skewed data with dynamic transform of both sides with both
coxBox()
and yeoJohnson()
transformations with maximum likelihood or the EM method “saem”. bcmixed fits Box-Cox-transformed LMMs and provides inferences for differences between treatment levels. boxcoxmix fits Box-Cox transformed LMMs and logistic mixed models.
- Spatial models: nlme (with
corStruct
functions), CARBayesST, sphet, spind, spaMM, glmmfields, glmmTMB, inlabru (spatial point processes via log-Gaussian Cox processes), brms, LMMsolver, bamlss; see also the Spatial and SpatioTemporal CRAN task views.
- Sports analytics: mvglmmRank, multivariate generalized linear mixed models for ranking sports teams.
- Survival analysis: coxme.
- Tree-based models: glmertree, semtree, gpboost
- Weighted models: WeMix (linear and logit models with weights at multiple levels)
- Zero-inflated models: (frequentist) glmmTMB, cplm, mgcv (zi Poisson only), GLMMadaptive; (Bayesian): MCMCglmm, brms, bamlss.
- Zero-one inflated Beta regression: brms, zoib, glmmTMB (zero-inflated only). Ordered beta regression is an alternative framework to address the same type of data: ordbetareg, glmmTMB
Hierarchical modeling frameworks
These packages do not directly provide functions to fit mixed models, but instead implement interfaces to general-purpose sampling and optimization toolboxes that can be used to fit mixed models. While models require extra effort to set up, and often require programming in a domain-specific language other than R, these frameworks are more flexible than most of the other packages listed here.
Model diagnostics and summary statistics
Model diagnostics
Summary statistics
- Correlations: iccbeta (intraclass correlation), rptR (repeatabilities)
- R2 calculations: r2glmm (R2 and partial R2), MuMIn (
r.squaredGLMM()
function), partR2, performance (r2()
function), rr2, mlmtools, mlmhelpr (Note that there are many different methods for computing R2 values for (G)LMMs: see e.g. Nakagawa, Johnson and Schielzeth (2017), Jaeger et al. (2017).). Many of these packages also compute intra-class correlations.
- Information criteria: cAIC4 (conditional AIC) , blmeco (WAIC).
- Robust variance-covariance estimates: clubSandwich, merDeriv, mlmhelpr, glmmrBase
Derivatives
The first and second derivatives of log-likelihood with respect to parameters can be useful for various model evaluation tasks (e.g., computing sensitivities, robust variance-covariance matrices, or delta-method variances).
Data sets
Many packages include small example data sets (e.g., lme4, nlme). These packages provide previously described data sets often used in evaluating mixed models.
Model presentation and prediction
Functions and frameworks for convenient and tabular and graphical output of mixed model results:
Convenience wrappers
These functions provide convenient frameworks to fit and interpret mixed models.
- Model fitting: multilevelmod, ez, mixlm, afex, and nimble.
- Model summaries: broom.mixed, insight
- Variable selection & model averaging: LMERConvenienceFunctions, MuMIn, glmulti (see, e.g., maintainer’s blog or here for use with mixed models). mlmhelpr
- Centering/scaling predictors at the population or group level: mlmhelpr, mlmtools,
arm::standardize()
Inference and model selection
Hypothesis testing
Prediction and estimation
Bootstrapping
Power analysis and simulation
These topics are closely related because there are few available analytical methods for computing statistical power for mixed models; power usually needs to be estimated by simulation.
Model selection
Commercial software interfaces
- Mplus: MplusAutomation.
- ASReml-R: asremlPlus.
- babelmixr2 allows nlmixr2 models to be translated and run in either the commercial tool Monolix or NONMEM and then reads the results back in to create a standardized
nlmixr2
fit object. This fit object runs the diagnostics in nlmixr2
and compares them to the ones output in the commercial software to “validate” the fit object against the output of the commercial tool. It also interfaces with free tools such as PKNCA for automatically using observed pharmacokinetic (PK) data for initial estimates of PK models.
CRAN packages
Core: | brms, broom.mixed, geepack, glmmTMB, lavaan, lme4, MCMCglmm, multilevelmod, nlme, sommer. |
Regular: | afex, aod, aods3, ARpLMEC, asremlPlus, babelmixr2, bamlss, bcmixed, BGLR, blavaan, blme, blmeco, boot.pval, boxcoxmix, buildmer, cAIC4, car, CARBayesST, CLME, clubSandwich, coxme, CpGassoc, cplm, CRTgeeDR, DHARMa, dhglm, dotwhisker, effects, emmeans, ez, faux, galamm, gamlss, gamm4, gee, geeM, geesmv, ggeffects, ggResidpanel, glmertree, glmm, GLMMadaptive, glmmEP, glmmfields, glmmLasso, glmmrBase, GLMMRR, glmtoolbox, glmulti, gpboost, greta, hglm, HLMdiag, huxtable, iccbeta, influence.ME, influence.SEM, inlabru, insight, JointAI, kinship2, languageR, lmeInfo, LMERConvenienceFunctions, lmeresampler, lmerTest, lmeSplines, lmmpar, longpower, lqmm, marginaleffects, MarginalMediation, margins, MASS, mbest, mclogit, MCMC.qpcr, mdhglm, mdmb, merDeriv, merTools, mgcv, mice, mixedsde, mixlm, mlmhelpr, mlmRev, mlmtools, mmrm, modelsummary, MplusAutomation, mrgsolve, multgee, multilevelTools, MuMIn, mvctm, mvglmmRank, nimble, nlmeU, nlmixr2, nlmixr2data, nlmm, ordbetareg, ordinal, pan, parameters, partR2, pass.lme, pbkrtest, pedigreemm, performance, pez, phyr, piecewiseSEM, PKNCA, PKPDsim, plm, powerEQTL, QGglmm, qrLMM, qrNLMM, QTLRel, R2BayesX, r2glmm, R2jags, R2OpenBUGS, regress, repeated, rjags, RLRsim, robustBLME, robustlmm, rockchalk, rptR, rr2, rrBLUP, rstan, rstanarm, RTMB, RVAideMemoire, rxode2, saemix, SASmixed, sem, semtree, simr, sjPlot, skewlmm, spaMM, sphet, spind, splmm, StroupGLMM, TMB, tmbstan, varTestnlme, VetResearchLMM, vglmer, WeMix, zoib. |
Related links
Other resources