This vignette illustrates the usage of the `SNPknock`

package for creating knockoff copies of variables distributed as discrete Markov chains and hidden Markov models (Sesia, Sabatti, and Candès 2017). For simplicity, we will use synthetic data.

The `SNPknock`

package also provides a simple interface to the genotype imputation software `fastPhase`

, which can be used to fit hidden Markov models for genotype data. Since `fastPhase`

is not available as an R package, this particular functionality of `SNPknock`

cannot be demonstrated here. A tutorial showing how to use a combination of `SNPknock`

and `fastPhase`

to create knockoff copies of genotype data can be found here: https://web.stanford.edu/~msesia/software.html.

First, we verify that the `SNPknock`

can be loaded.

`library(SNPknock)`

We define a Markov chain model with 50 variables, each taking one of 5 possible values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries.

```
p=50; # Number of variables in the model
K=5; # Number of possible states for each variable
# Marginal distribution for the first variable
pInit = rep(1/K,K)
# Create p-1 transition matrices
Q = array(stats::runif((p-1)*K*K),c(p-1,K,K))
for(j in 1:(p-1)) {
Q[j,,] = Q[j,,] + diag(rep(1,K))
Q[j,,] = Q[j,,] / rowSums(Q[j,,])
}
```

We can sample 100 independent observations of this Markov chain using the `SNPknock`

package.

```
set.seed(1234)
X = SNPknock.models.sampleDMC(pInit, Q, n=100)
print(X[1:5,1:10])
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0 0 2 1 2 3 2 0 0 2
## [2,] 3 3 2 4 2 3 1 3 3 3
## [3,] 3 0 1 2 3 4 0 2 2 4
## [4,] 3 0 3 3 3 0 2 2 1 1
## [5,] 4 0 2 4 3 3 3 1 1 4
```

Above, each row of `X`

contains an independent realization of the Markov chain.

A knockoff copy of `X`

can be sampled as follows.

```
Xk = SNPknock.knockoffDMC(X, pInit, Q)
print(Xk[1:5,1:10])
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 3 2 2 2 2 2 1 3 0 0
## [2,] 3 3 2 2 1 1 3 3 3 3
## [3,] 0 3 0 3 3 0 0 2 2 2
## [4,] 4 3 3 4 2 0 2 2 2 1
## [5,] 0 0 3 3 3 3 1 3 4 2
```

If you want to see how to use `SNPknock`

to create knockoff copies of genotype data, see the genotypes vignette.

Sesia, M., C. Sabatti, and E. J. Candès. 2017. “Gene Hunting with Knockoffs for Hidden Markov Models.” *ArXiv E-Prints*, June. https://arxiv.org/abs/1706.04677.