An `R`

package implementing the **SE-set
algorithm**, a tool to explore statistically-equivalent path
models from correlation matrices and Gaussian Graphical Models
(GGMs).

This repository contains an `R`

package used by Ryan,
Bringmann and Schuurman (pre-print)[PsyArXiv] to aid researchers in
investigating the relationship between a given GGM estimate (in the form
of a precision matrix) and possible underlying linear path models.
Linear path models can also be interpreted as “weighted DAGs”. Full
details on the package can be found in **Appendix B** of
the manuscript, and code to reproduce the empirical illustration in that
paper can be found in this
repository.

Please note that we recommend the use of this tool only for GGMs with
**13 or less nodes** as the size of the outputted object,
and the run-time, grows factorialy (\(p!\)) with each node in the network.

The current version of this package can be installed directly from github using

`::install_github("ryanoisin/SEset") devtools`

The main SE-set function takes a precision matrix as input by default
(but can also take a matrix of partial correlations, or a covariance
matrix, detailed below). This can be estimated using packages such as
`qgraph`

, using either raw data or a matrix of correlations.
For example, using the `riskcor`

correlation matrix supplied
with the package as input, the precision matrix `omega`

can
be estimated by running

```
# load data
data(riskcor)
# estimate precision matrix
<- qgraph::EBICglasso(riskcor, n = 69, returnAllResults = TRUE)
estimate <- estimate$optwi
omega
# The precision matrix can also be standardized to a partial correlation matrix, and plotted as a network
library(qgraph)
<- qgraph::wi2net(omega)
parcor <- qgraph(parcor, repulsion = .8,vsize = c(10,15), theme = "colorblind", fade = F, edge.labels = TRUE) pnet
```

Given a \(p\)-variate precision
matrix, the `SEset`

package can be used to obtain a set of
maximally \(p!\) statistically
equivalent weighted DAGs (i.e., linear path models): One DAG for every
possible topological ordering of the \(p\) variables, from exogenous to
endogenous. This is the SE-set of `omega`

which gives the
package its name.

The statistically-equivalent set is found by using the
`network_to_SEset`

function

`<- network_to_SEset(omega, digits = 2, rm_duplicates = TRUE) SE_example `

where the `digits`

arguments determines the rounding of
parameters in the weighted DAGs, and the `rm_duplicates`

argument indicates that only unique weighted DAGs should be returned:
duplicates are removed after rounding. Typically, when duplicates are
removed, the number of unique DAGs returned is much less than \(p!\).

Individual members of the SE-set can be plotted, for example using
qgraph. When visualizing, remember that the weights matrix of the path
model \(B\) is given in SEM form \(X = BX +e\), and so should be read “from
column to row”, that is \(b_{ij}\)
encodes the direct effect of \(X_j\) on
\(X_i\). As a consequence, matrices
should be transposed when being used as input to `qgraph`

and
most other network visualization tools.

```
<- matrix(SE_example[1,],6,6)
DAG1 <- matrix(SE_example[30,],6,6)
DAG2
par(mfrow=c(1,2))
qgraph(t(DAG1), repulsion = .8,vsize = c(10,15), theme = "colorblind", fade = F,
layout = pnet$layout, maximum = 2, edge.labels = T)
qgraph(t(DAG2), repulsion = .8,vsize = c(10,15), theme = "colorblind", fade = F,
layout = pnet$layout, maximum = 2, edge.labels = T)
```

We also supply additional functions to aid in exploring the SE-set.

For example, the function `propcal`

calculates the
proportion of DAGs in the SE-set in which an edge between two variables
is present. You can choose distinguish between the presence of
*some* edge or an edge of a particular direction using the
`directed`

option

```
# Undirected edge frequency
<- propcal(SE_example, rm_duplicate = F, directed = FALSE)
propu
# Directed edge frequency
<- propcal(SE_example, rm_duplicate = F, directed = TRUE)
propd
# Plot each as a network
par(mfrow=c(1,2))
qgraph(propu, edge.color = "darkgreen", layout = pnet$layout, edge.labels = T, maximum = 1)
qgraph(propd, edge.color = "darkgreen", layout = pnet$layout, edge.labels = T, maximum = 1)
```

Given the SEset, you can essentially explore how any property of one
graphical model varies across members. For example, we may be interested
in the distribution of variance explained (\(R^2\)) for each variable when predicted by
it’s directed causes. This will differ across SEset members because the
direct casues of any given variable will typically differ across the
different path models. This can be computed using the
`r2_distribution`

function.

```
<- r2_distribution(SEmatrix = SE_example, cormat = riskcor, names = NULL,
r2set indices = c(1,3,4,5,6))
```

This distribution can be visualized, for instance using ggplot

```
library(tidyr)
library(ggplot2)
require(ggridges)
<- as.data.frame(r2set, col.names = paste0("X",1:6))
df
<- tidyr::gather(df)
df2
<- ggplot(df2, aes(y = key, x = value)) +
p geom_density_ridges(fill = "light seagreen") +
labs(y = "Variable", x = expression(paste("Controllability value ", R^2)))
+ theme(text = element_text(size = 20),
p axis.title.x = element_text(size = 20),
axis.title.y = element_text(size = 20),
panel.background = element_blank())
```

Alternatively, we can supply a matrix of *partial
correlations*, such as output by the `parcor`

package, or
a (model-implied) *covariance matrix* using the
`input_type`

argument, demonstrated below.

```
# Using the partial correlation matrix as input
<- network_to_SEset(parcor, digits = 2, rm_duplicates = TRUE, input_type = "parcor")
SE_example_2 # Calculating the model-implied covariance matrix from the precision matrix
<- solve(omega)
MIcov <- network_to_SEset(MIcov, digits = 2, rm_duplicates = TRUE, input_type = "MIcov") SE_example_3
```

Note only that, since the partial correlation matrix does not contain
information on the diagonal elements of the precision matrix (that is,
the partial variances), if a partial correlation matrix is supplied, we
transform this to a correlation matrix using the `corpcor`

package. As such, small numerical differences (approximately in the 7th
decimal place) may be present depending on the input used.

For more details please contact **o.ryan@uu.nl**