---
title: "Matrix Variate Normal Distributions with MixMatrix"
author: "Geoffrey Thompson"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Matrix Normal Distributions}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
bibliography: matrixnormal.bib
link-citations: yes
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(MixMatrix)
```
## Matrix Variate Distributions
Suppose there is a two-dimensional grid of observations: each observation is randomly distributed somehow, perhaps each spot on the grid has its own mean, there's a covariance relation across the rows that doesn't vary across the columns and a covariance relation across the columns that doesn't vary across the rows. One case where this might come up is a multivariate time series: at each point in time we are dealing with a multivariate observation (perhaps normally distributed or not) and then there is a covariance structure across time to account for. We may be interested in drawing from such distributions, calculating densities, or estimating parameters based on observations. This package presents some functions for doing so in the case of matrix variate normal distributions. For more details about matrix variate distributions in general and matrix variate normal distributions in particular, see @gupta1999matrix, whose results I rely on for much of this presentation.
## Matrix Variate Normal Distributions
A random matrix $\mathbf{X}$ $(p \times n)$ with a matrix variate normal distribution is parameterized by a mean matrix $\mathbf{M}$ $(p \times n)$ and covariance matrices $\mathbf{U}$ $(p \times p)$ (determining covariance among rows) and $\mathbf{V}$ $(n \times n)$ (determining covariance among columns) and has the following probability density function (@Matrix_normal_distribution):
\[p(\mathbf{X}\mid\mathbf{M}, \mathbf{U}, \mathbf{V}) = \frac{\exp\left( -\frac{1}{2} \, \mathrm{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right] \right)}{(2\pi)^{np/2} |\mathbf{V}|^{n/2} |\mathbf{U}|^{p/2}} \]
Here is a useful fact for drawing observations from a matrix variate normal distribution: suppose $\mathbf{X}$ is distributed as a matrix variate normal distribution with mean matrix $\mathbf{0}$ and covariance matrices $\mathbf{I_n}$ and $\mathbf{I_p}$. Then for a mean matrix $\mathbf{M}$ and linear transformations $\mathbf{L}$ and $\mathbf{R}$ of appropriate dimensions, if
\[Y = \mathbf{M} + \mathbf{L}\mathbf{X}\mathbf{R}\]
then $\mathbf{Y}$ is matrix variate normally distributed with parameters $\mathbf{M}, \mathbf{LL}^T, \mathbf{R}^T\mathbf{R}$.
Matrix variate random variables can then be generated by specifying $\mathbf{M}$ and $\mathbf{L}$ and $\mathbf{R}$ or by specifying $\mathbf{U}$ and $\mathbf{V}$. If the covariance matrices are provided, then Cholesky decomposition is performed to create the $\mathbf{L}$ and $\mathbf{R}$ matrices. If $\mathbf{L}$ and $\mathbf{R}$ matrices are provided, they are used to create $\mathbf{U}$ and $\mathbf{V}$ matrices which are then decomposed. If these are not specified, the variances are presumed to be identity matrices by default and the means are presumed to be zero. If $\mathbf{L}$ or $\mathbf{R}$ are not full rank linear transformations, degenerate matrix normal distributions can be produced.
```{r mnorm}
set.seed(20180203)
x <- matrix(rnorm(6),nrow=3)
x
set.seed(20180203)
y <- rmatrixnorm(n = 1, mean=matrix(rep(0,6),nrow=3))
y
U <- 5 * diag(3) + 1
V <- matrix(c(2,0,0,.1),nrow=2)
mu = matrix(1:6,nrow=3)
set.seed(20180203)
z <- rmatrixnorm(n = 1, mean=mu,U=U,V=V)
z
mu + t(chol(U)) %*% y %*% chol(V)
```
When $n=1$, by default this returns a matrix. For $n>1$, by default, it will return a three-dimensional array indexed by the last coordinate (array convention similar to `rWishart`).
```{r noptions}
set.seed(20180203)
x <- rmatrixnorm(n = 1, mean=matrix(rep(0,6),nrow=3))
x
set.seed(20180203)
y <- rmatrixnorm(n = 100, mean=matrix(rep(0,6),nrow=3),list = TRUE)
y[[1]]
set.seed(20180203)
z <- rmatrixnorm(n = 100, mean=matrix(rep(0,6),nrow=3),array = TRUE)
z[ , , 1]
```
Densities are computed in much the same way as for other distributions. The defaults for the mean parameters are 0 and identity for the variance matrices. The `log` parameter sets whether to return the density on the log scale - by default this is `FALSE`.
```{r density}
set.seed(20180202)
A = rmatrixnorm(n=1,mean=matrix(c(100,0,-100,0,25,-1000),nrow=2),
L=matrix(c(2,1,0,.1),nrow=2))
dmatrixnorm(A,mean=matrix(c(100,0,-100,0,25,-1000),nrow=2),
L=matrix(c(2,1,0,.1),nrow=2),log=TRUE )
```
## Parameter estimation for matrix variate normal distributions
Maximum likelihood estimation is possible for the matrix variate normal distribution (see @dutilleul1999mle or @2013arXiv13096609G for details). The pair of variance matrices are only identifiable up to a constant, so this function sets the first element of $\mathbf{U}$ to $1$. There is a closed-form solution for the mean matrix and the $\mathbf{U}$ and $\mathbf{V}$ have an iterative solution.
```{r mleone}
set.seed(20180202)
A = rmatrixnorm(n=100,mean=matrix(c(100,0,-100,0,25,-1000),nrow=2),
L=matrix(c(2,1,0,.1),nrow=2),list=TRUE)
results=MLmatrixnorm(A)
print(results)
```
There are two restrictions possible for the mean matrices: `row.mean = TRUE` will force a common mean within a row and `col.mean = TRUE` will force a common mean within a column. Setting both will ensure a constant mean for the entire system. Restrictions on $\mathbf{U}$ and $\mathbf{V}$ are possible with `row.variance` and `col.variance` commands.
```{r mlerow}
results.fixrows = MLmatrixnorm(A, row.mean = TRUE, max.iter = 5)
print(results.fixrows)
# this failure is expected with misspecification! The number of iterations is also
# fixed to be low so the vignette compiles quickly.
```
Currently the options for variance restrictions are to impose an AR(1) structure by providing the `AR(1)` option, a compound symmetry structure by providing the `CS` option, to impose a correlation matrix structure by specifying `correlation` or `corr`, or to impose an identical and independent structure by specifying `Independent` or `I`. This works by using `uniroot` to find the appropriate $\rho$ which sets the derivative of the log-likelihood to zero for the `AR` and `CS` options - it is not fast but if this is the true structure it will be better in some sense than an unstructured variance matrix. The $\rho$ parameter should be $>0$ and is forced to be non-negative. If the data behaves incompatibly with those restrictions, the function will provide a warning and exit with the current model fit.
```{r mlevar,warning=FALSE}
# tolerance and maximum iterations are limited here to make the vignette compile faster
B = rmatrixnorm(n = 50, mean = matrix(1:15,nrow = 5),
U = 3*stats::toeplitz(.6^(1:5)))
MLmatrixnorm(B, row.variance = "AR(1)", tol = 1e-5)
MLmatrixnorm(B, tol = 1e-5)
```
In order to fit parameters for an $n \times p$ matrix variate normal distribution, at least $\max (n/p,p/n) + 1$ observations are needed. Maximum likelihood estimates are unique if there are more than $\max(p,n)$ observations.
The function expects data in a format produced by the `rmatrixnorm` function.
The results are only the parameter estimates and do not include measures of uncertainty.
## Session Information
This vignette was compiled with `rmarkdown`.
```{r sessioninfo}
sessionInfo()
```
## All the code for easy copying
```{r getlabels, echo = FALSE}
labs = knitr::all_labels()
labs = labs[!labs %in% c("setup", "toc", "getlabels", "allcode")]
```
```{r allcode, ref.label = labs, eval = FALSE}
```
## References