**missoNet** is a package that fits penalized multi-task
regression – that is, with multiple correlated tasks or response
variables – to simultaneously estimate the coefficients of a set of
predictor variables for all tasks and the conditional response network
structure given all predictors, via penalized maximum likelihood in an
undirected conditional Gaussian graphical model. In contrast to most
penalized multi-task regression (conditional graphical lasso) methods,
**missoNet** has the capability of obtaining estimates even
when the response data is corrupted by missing values. The method
automatically enjoys the theoretical and computational benefits of
convexity, and returns solutions that are comparable/close to the
estimates without any missing values.

**missoNet** assumes the model \[
\mathbf{Y} = \mathbf{X}\mathbf{B}^* + \mathbf{E},\ \ \mathbf{E}_{i\cdot}
\stackrel{iid}{\sim} \mathcal{MVN}(0_q,(\mathbf{\Theta}^*)^{-1})\ \
\forall i = 1,...,n,
\] where \(\mathbf{Y} \in
\mathbb{R}^{n\times q}\) and \(\mathbf{X} \in \mathbb{R}^{n\times p}\)
represent the centered response data matrix and predictor data matrix,
respectively (thus the intercepts are omitted). \(\mathbf{E}_{i\cdot} \in \mathbb{R}^{q}\) is
the \(i\)th row (\(i\)th sample) of the error matrix and is
drawn from a multivariate Gaussian distribution. \(n, p, q\) denote the sample size, the
number of predictors and the number of responses, respectively. The
regression coefficient matrix \(\mathbf{B}^*
\in \mathbb{R}^{p\times q}\) and the precision (inverse
covariance) matrix \(\mathbf{\Theta}^* \in
\mathbb{R}^{q\times q}\) are parameters to be estimated in this
model. Note that the entries of \(\mathbf{\Theta}^*\) have a one-to-one
correspondence with partial correlations. That is, the random variables
\(Y_j\) and \(Y_k\) are conditionally independent
(i.e. \(\mathbf{\Theta}^*_{jk} = 0\))
given \(X\) and all other remaining
variables in \(Y\) iff the
corresponding partial correlation coefficient is zero.

To estimate the parameters, **missoNet** solves a
penalized MLE problem: \[
(\hat{\mathbf{\Theta}},\hat{\mathbf{B}}) = \underset{\mathbf{\Theta}
\succeq 0,\ \mathbf{B}}{\text{argmin}}\
\mathrm{tr}\left[\frac{1}{n}(\mathbf{Y} - \mathbf{XB})^\top(\mathbf{Y} -
\mathbf{XB}) \mathbf{\Theta}\right]
- \mathrm{log}|\mathbf{\Theta}| +
\lambda_{\Theta}(\|\mathbf{\Theta}\|_{1,\mathrm{off}} + 1_{n\leq
\mathrm{max}(p,q)} \|\mathbf{\Theta}\|_{1,\mathrm{diag}}) +
\lambda_{B}\|\mathbf{B}\|_1.
\] \(\mathbf{B}^*\) that mapping
predictors to responses and \(\mathbf{\Theta}^*\) that revealing the
responses’ conditional dependencies are estimated for the lasso
penalties – \(\lambda_B\) and \(\lambda_\Theta\), over a grid of values (on
the log scale) covering the entire range of possible solutions. To learn
more about sparse multi-task regression with inverse covariance
estimation using datasets without missing values, see MRCE.

However, missingness in real data is inevitable. Most penalized multi-task regression / conditional graphical lasso methods either assume the data is fully-observed (eg. MRCE), or deal with missing data through a likelihood-based method such as the EM algorithm (e.g, cglasso). An important challenge in this context is the possible non-convexity of the associated optimization problem when there is missing data, as well as the high computational cost from iteratively updating the estimations for parameters.

**missoNet** aims to handle a specific class of datasets
where the response matrix \(\mathbf{Y}\) contains data which is missing
at random (MAR). To use **missoNet**, users do not need to
possess additional knowledge for pre-processing missing data
(e.g. imputation) nor for selection of regularization parameters (\(\lambda_B\) and \(\lambda_\Theta\)), the method provides a
unified framework for automatically solving a convex modification of the
multi-task learning problem defined above, using corrupted datasets.

The package provides an integrated set of core routines for data
simulation, model fitting and cross-validation, visualization of
results, as well as predictions in new data. The function arguments are
in the same style as those of `glmnet`

, making it easy for
experienced users to get started.

To install the package **missoNet** from CRAN, type the
following command in the R console:

Or install the development version of **missoNet** from
GitHub:

The purpose of this section is to give users a general overview of the functions and their usages. We will briefly go over the main functions, basic operations and outputs.

First, we load the **missoNet** package:

We will use a synthetic dataset for illustration. To generate a set
of data with missing response values, we can simply call the built-in
function `generateData`

:

```
## Specify a random seed for reproducibility.
## The overall missing rate in the response matrix is around 10%.
## The missing mechanism can also be "MCAR" or "MNAR".
sim.dat <- generateData(n = 150, p = 15, q = 12, rho = 0.1, missing.type = "MAR", with.seed = 1512)
tr <- 1:120 # training set indices
tst <- 121:150 # test set indices
```

This command returns an object containing all necessary components for analysis including:

`'X'`

,`'Y'`

: a predictor matrix \(\mathbf{X} \in \mathbb{R}^{n\times p}\) and a complete (clean) response matrix \(\mathbf{Y} \in \mathbb{R}^{n\times q}\), in which rows correspond to samples and columns correspond to variables. The rows of \(\mathbf{Y}\) is sampled from \(\mathcal{MVN}(X\mathbf{B}^*,(\mathbf{\Theta}^*)^{-1})\).`'Beta'`

,`'Theta'`

: the true parameters \(\mathbf{B}^* \in \mathbb{R}^{p\times q}\) and \(\mathbf{\Theta}^* \in \mathbb{R}^{q\times q}\) for the generative model that will be estimated later (see the later section for more details on their structures and sparsity).`'Z'`

: a corrupted response matrix \(\mathbf{Z} \in \mathbb{R}^{n\times q}\) having elements \(\mathbf{Z}_{ij}\) (\(\forall\ i=1,...,n,\ \ j=1,...,q\)) which is either`NA`

or equal to \(\mathbf{Y}_{ij}\).

The probabilities of missingness for the response variables and the
missing mechanism are specified by `'rho'`

and
`'missing.type'`

, respectively. `'rho'`

can be a
scalar or a vector of length \(q\). In
the code above, `'rho' = 0.1`

indicates an overall missing
rate of around 10%. In addition to `"MAR"`

, the
`'missing.type'`

can also be `"MCAR"`

or
`"MNAR"`

, see the later section for more details on how
missing values are generated under different mechanisms.

the program only accepts missing values that are coded as eitherNOTE 1:`NA`

s or`NaN`

s.

although the program will provide estimates even when \(n \leq \text{max}(p,q)\), convergence is likely to be rather slow, and the variance of estimation is likely to be excessively large. Therefore, weNOTE 2:suggestthat both the predictor columns and the response columns be reduced (filtered) if possible, prior to fitting any models. For example in genomics, where \(\mathbf{X}\) can very wide (i.e. large \(p\)), we often filter features based on variance or coefficient of variation.

We can easily visualize how missing data is distributed in the
corrupted response matrix \(\mathbf{Z}\) using package
**visdat**:

A single model can be fitted using the most basic call to
`missoNet`

:

```
## Training set
X.tr <- sim.dat$X[tr, ] # predictor matrix
Z.tr <- sim.dat$Z[tr, ] # corrupted response matrix
## Using the training set to fit the model.
## 'lambda.Beta' and 'lambda.Theta' are arbitrarily set to 0.1.
## 'verbose' = 0 suppresses printing of messages.
fit <- missoNet(X = X.tr, Y = Z.tr, lambda.Beta = 0.1, lambda.Theta = 0.1, verbose = 0)
```

However, `missoNet`

should be more commonly used with a
grid of values for \(\lambda_B\) and
\(\lambda_\Theta\). The two penalty
vectors must have the same length, and the `missoNet`

function will be called with each consecutive pair of values – i.e. with
the first elements of the two vectors, then with the second elements,
etc.

```
lambda.Beta.vec <- 10^(seq(from = 0, to = -1, length.out = 10)) # 10 values on the log scale, from 1 to 0.1
lambda.Theta.vec <- rep(0.1, 10) # for each value of 'lambda.Beta', 'lambda.Theta' remains constant at 0.1
fit_list <- missoNet(X = X.tr, Y = Z.tr, lambda.Beta = lambda.Beta.vec, lambda.Theta = lambda.Theta.vec, verbose = 0)
```

The command above returns a sequence of models for users to choose
from. In many cases, users may prefer that the software selects the best
fit among them. Cross-validation is perhaps the simplest and most widely
used method for this task. `cv.missoNet`

is the main function
to do cross-validation, along with supporting methods such as
`plot`

and `predict`

.

Here we use `cv.missoNet`

to perform a five-fold
cross-validation, samples will be permuted before splitting into
multiple folds. For reproducibility, we assign a random seed to the
permutation.

```
## If 'permute' = FALSE, the samples will be split into k-folds in their original orders,
## i.e. the first (n/'kfold') samples will be assigned to the first fold an so on.
cvfit <- cv.missoNet(X = X.tr, Y = Z.tr, kfold = 5,
lambda.Beta = lambda.Beta.vec, lambda.Theta = lambda.Theta.vec,
permute = TRUE, with.seed = 433, verbose = 0)
#> Warning in boundaryCheck(lambda.Theta = lambda.Theta, lambda.Beta = lambda.Beta, :
#> The optimal `lambda.Theta` is close to the upper boundary of the search scope,
#> try to provide a new sequence covering larger values for the following argument:
#>
#> 1. lambda.Theta
```

The program has warned us that the range of \(\lambda_\Theta\) is inappropriate so that
we need to supply a sequence of \(\lambda_\Theta\) covering larger values.
However, picking a suitable range for such hyperparameters may require
experience or a series of trials and errors. Therefore,
`cv.missoNet`

provides a smarter way to solve this
problem.

Let’s fit the cross-validation model again, this time running all
folds in parallel on two CPU cores. The parallelization of
**missoNet** models relies on package
**parallel**, so make sure the parallel clusters have been
registered beforehand.

```
## 'fit.1se = TRUE' tells the program to make additional estimations of the
## parameters at the largest value of 'lambda.Beta' / 'lambda.Theta' that gives
## the most regularized model such that the cross-validated error is within one
## standard error of the minimum.
cl <- parallel::makeCluster(min(parallel::detectCores()-1, 2))
cvfit <- cv.missoNet(X = X.tr, Y = Z.tr, kfold = 5,
fit.1se = TRUE, parallel = TRUE, cl = cl,
permute = TRUE, with.seed = 433, verbose = 1)
parallel::stopCluster(cl)
```

Note that we have not explicitly specified the regularization
parameters \(\lambda_B\) and \(\lambda_\Theta\) in the command above. In
this case, a grid of \(\lambda_B\) and
\(\lambda_\Theta\) values in a
(hopefully) reasonable range will be computed and used by the program.
Users can even provide just one of \(\lambda_B\) and \(\lambda_\Theta\) vectors such that the
program will compute the other. Generally, it is difficult at the
beginning to choose suitable \(\lambda\) sequences, so we
**suggest** users start analysis using
`cv.missoNet`

, and let the program compute the appropriate
\(\lambda_B\) and \(\lambda_\Theta\) values itself, and then
automatically pick the optimal combination of them at which the minimum
cross-validated error is achieved.

`cv.missoNet`

returns a `'cv.missoNet'`

object,
a list with all the ingredients of the cross-validated fit. We can
execute the `plot`

method

```
## The values of mean cross-validated errors along with upper and lower standard deviation bounds
## can be accessed via 'cvfit$cvm', 'cvfit$cvup' and 'cvfit$cvlo', respectively.
plot(cvfit)
```

to visualize the mean cross-validated error in a heatmap style. The
white solid box marks out the position of the minimum mean
cross-validated error with corresponding \(\lambda_B\) and \(\lambda_\Theta\) (`"lambda.min"`

:= \(({\lambda_B}_\text{min},
{\lambda_\Theta}_\text{min})\)), and the white dashed boxes
indicate the largest \(\lambda_B\) and
\(\lambda_\Theta\) at which the mean
cross-validated error is within one standard error of the minimum, by
fixing the other one at `"lambda.min"`

(i.e. `"lambda.1se.Beta"`

:= \(({\lambda_B}_\text{1se},
{\lambda_\Theta}_\text{min})\) and
`"lambda.1se.Theta"`

:= \(({\lambda_B}_\text{min},
{\lambda_\Theta}_\text{1se})\)). We often use this
“one-standard-error” rule when selecting the most parsimonious model;
this acknowledges the fact that the cross-validation surface is
estimated with error, so error on the side of parsimony is
preferable.

We can also plot the errors in a 3D scatter plot (without surfaces):