Denote the \(p\)-variate predictors \(x_i\), \(i=1,…,n\) with corresponding responses \(y_i\). The predictors are assumed to follow the model \[ x_i = A s_i + b, \] where \(A\) is a non-singular \(p \times p\) matrix, \(b\) a \(p\)-vector and the random vector \(s\) can be decomposed into \((s_1^T,s_2^T)^T\) with \(E(s)=0\) and \(Cov(s)=I_p\) where \(s_1\) has dimension \(k\) and \(s_2\) dimension \(p-k\). It is then assumed that \(s_1\) is the signal part, the interesting components, which are relevant to model \(y\), whereas \(s_2\) is the noise part.

The working model assumption is then that \[ (y,s_1^T)^T \ is \ independent \ of \ s_2. \] Defining \(S_1=E((x-b)(x-b)^T)\) and \(S_2=E(E(x-b|y)E(x-b|y)^T)\) the sliced inverse regression (SIR) can be interpreted as finding the transformation matrix \(W\) such that \[ W S_1 W^T = I_p \ and \ W S_2 W^T = D, \] where \(D\) is a diagonal matrix with diagonal elements \(d_1 \geq d_2 \geq … \geq d_k > d_{k+1} = … = d_p = 0\).

In practice \(S_2\) is estimated by approximating \(E(x-b|y)\) by dividing \(y\) into \(h\) slices where in this package \(y\) is divided into \(h\) intervals containing an equal number of observations.

The practical problem is to decide then the value \(k\).

For an asymptotic test using the test statistic \[ T= n\sum_{i=k+1}^p d_i, \] the limiting distribution under the null is then \(T \sim \chi^2_{(p-k)(h-k-1)}\). Therefore for the hypothetical value \(k\) and the number of slices \(h\) is required that \(k \geq h-1\).

Bootstrap tests can be constructed as follows

- Compute \(\hat s_i = \hat W(x_i - \bar x_i)\), and decompose into \(\hat s_{i,1}\) and \(\hat s_{i,2}\).
- Sample a bootstrap sample \((y_i^*, \hat s_{i,1}^*)\) of size \(n\) from \((y_i, \hat s_{i,1})\).
- Sample a bootstrap sample \(\hat s_{i,2}^*\) of size \(n\) from \(\hat s_{i,2}\).
- Compute \(x_i^* = \hat W^{-1} ((\hat s_{i,1}^*)^T,(\hat s_{i,2}^*)^T)^T\).
- Recompute the test statistic \(T^*\) based on the bootstrap data \(x_i^*\), \(i=,1,…,n\).
- Repeat the steps above \(m\) times to obtain bootstrap test statistics \(T^*_1,…,T^*_m\).

The p-value is then \[ \frac{\#(T_i^* \geq T)+1}{m+1}. \]

Some simulated data with true \(k=2\):

```
set.seed(1234)
n <- 200
p <- 10
X <- matrix(rnorm(p*n), ncol = p)
eps <- rnorm(n, sd=0.25)
y <- X[, 1]/ (0.5+(X[, 2]+1.5)^2)
pairs(cbind(y,X))
```

First performing the asymptotic test

```
library(ICtest)
SIRasympk2 <- SIRasymp(X,y,2)
screeplot(SIRasympk2)
SIRasympk2
```

```
##
## SIR test for subspace dimension
##
## data: X
## T = 65.121, df = 56, p-value = 0.1891
## alternative hypothesis: the last 8 eigenvalues are not zero
```

Then the bootstrap test

```
SIRbootk2 <- SIRboot(X,y,2)
SIRbootk2
```

```
##
## SIR bootstrapping test for subspace dimension
##
## data: X
## T = 0.32561, replications = 200, p-value = 0.2388
## alternative hypothesis: the last 8 eigenvalues are not zero
```

Looking at the first two components and their relationship with the response

```
pairs(cbind(y, components(SIRbootk2, which="k")))
```