This vignette illustrates the process of transforming a set of
variables to a new set of uncorrelated (orthogonal) variables. It
carries out the Gram-Schmidt process **directly** by
successively projecting each successive variable on the previous ones
and subtracting (taking residuals). This is equivalent by replacing each
successive variable by its residuals from a least squares regression on
the previous variables.

When this method is used on the predictors in a regression problem,
the resulting orthogonal variables have exactly the same
`anova()`

summary (based on “Type I”, sequential sums of
squares) as do original variables.

We use the `class`

data set, but convert the character
factor `sex`

to a dummy (0/1) variable `male`

.

```
library(matlib)
data(class)
$male <- as.numeric(class$sex=="M") class
```

For later use in regression, we create a variable `IQ`

as
a response variable

```
<- transform(class,
class IQ = round(20 + height + 3*age -.1*weight -3*male + 10*rnorm(nrow(class))))
head(class)
```

```
## sex age height weight male IQ
## Alfred M 14 69.0 112.5 1 122
## Alice F 13 56.5 84.0 0 112
## Barbara F 13 65.3 98.0 0 112
## Carol F 14 62.8 102.5 0 111
## Henry M 14 63.5 102.5 1 126
## James M 12 57.3 83.0 1 100
```

Reorder the predictors we want, forming a numeric matrix,
`X`

.

```
<- as.matrix(class[,c(3,4,2,5)])
X head(X)
```

```
## height weight age male
## Alfred 69.0 112.5 14 1
## Alice 56.5 84.0 13 0
## Barbara 65.3 98.0 13 0
## Carol 62.8 102.5 14 0
## Henry 63.5 102.5 14 1
## James 57.3 83.0 12 1
```

The Gram-Schmidt process treats the variables in a given order,
according to the columns in `X`

. We start with a new matrix
`Z`

consisting of `X[,1]`

. Then, find a new
variable `Z[,2]`

orthogonal to `Z[,1]`

by
subtracting the projection of `X[,2]`

on
`Z[,1]`

.

```
<- cbind(X[,1], 0, 0, 0)
Z 2] <- X[,2] - Proj(X[,2], Z[,1])
Z[,crossprod(Z[,1], Z[,2]) # verify orthogonality
```

```
## [,1]
## [1,] 7.276e-12
```

Continue in the same way, subtracting the projections of
`X[,3]`

on the previous columns, and so forth

```
3] <- X[,3] - Proj(X[,3], Z[,1]) - Proj(X[,3], Z[,2])
Z[,4] <- X[,4] - Proj(X[,4], Z[,1]) - Proj(X[,4], Z[,2]) - Proj(X[,4], Z[,3]) Z[,
```

Note that if any column of `X`

is a linear combination of
the previous columns, the corresponding column of `Z`

will be
all zeros.

These computations are similar to the following set of linear regressions:

```
<- residuals(lm(X[,2] ~ X[,1]), type="response")
z2 <- residuals(lm(X[,3] ~ X[,1:2]), type="response")
z3 <- residuals(lm(X[,4] ~ X[,1:3]), type="response") z4
```

The columns of `Z`

are now orthogonal, but not of unit
length,

`zapsmall(crossprod(Z)) # check orthogonality`

```
## [,1] [,2] [,3] [,4]
## [1,] 57888 0 0 0
## [2,] 0 3249 0 0
## [3,] 0 0 7 0
## [4,] 0 0 0 2
```

We make standardize column to unit length, giving `Z`

as
an **orthonormal** matrix, such that \(Z' Z = I\).

```
<- Z %*% diag(1 / len(Z)) # make each column unit length
Z zapsmall(crossprod(Z)) # check orthonormal
```

```
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
```

`colnames(Z) <- colnames(X)`

The QR method uses essentially the same process, factoring the matrix
\(\mathbf{X}\) as \(\mathbf{X = Q R}\), where \(\mathbf{Q}\) is the orthonormal matrix
corresponding to `Z`

and \(\mathbf{R}\) is an upper triangular matrix.
However, the signs of the columns of \(\mathbf{Q}\) are arbitrary, and
`QR()`

returns `QR(X)$Q`

with signs reversed,
compared to `Z`

.

```
# same result as QR(X)$Q, but with signs reversed
head(Z, 5)
```

```
## height weight age male
## Alfred 0.2868 0.07545 -0.3687 0.12456
## Alice 0.2348 -0.08067 0.3569 -0.02177
## Barbara 0.2714 -0.07715 -0.3862 -0.45170
## Carol 0.2610 0.07058 0.1559 -0.20548
## Henry 0.2639 0.05132 0.1047 0.40538
```

`head(-QR(X)$Q, 5)`

```
## [,1] [,2] [,3] [,4]
## [1,] 0.2868 0.07545 -0.3687 0.12456
## [2,] 0.2348 -0.08067 0.3569 -0.02177
## [3,] 0.2714 -0.07715 -0.3862 -0.45170
## [4,] 0.2610 0.07058 0.1559 -0.20548
## [5,] 0.2639 0.05132 0.1047 0.40538
```

`all.equal( unname(Z), -QR(X)$Q )`

`## [1] TRUE`

We carry out two regressions of `IQ`

on the variables in
`X`

and in `Z`

. These are equivalent, in the sense
that

- The \(R^2\) and MSE are the same in both models
- Residuals are the same
- The Type I tests given by
`anova()`

are the same.

`<- data.frame(Z, IQ=class$IQ) class2 `

Regression of IQ on the original variables in `X`

```
<- lm(IQ ~ height + weight + age + male, data=class)
mod1 anova(mod1)
```

```
## Analysis of Variance Table
##
## Response: IQ
## Df Sum Sq Mean Sq F value Pr(>F)
## height 1 67 67.2 0.65 0.44
## weight 1 0 0.1 0.00 0.98
## age 1 8 8.4 0.08 0.78
## male 1 118 118.3 1.15 0.31
## Residuals 10 1033 103.3
```

Regression of IQ on the orthogonalized variables in
`Z`

```
<- lm(IQ ~ height + weight + age + male, data=class2)
mod2 anova(mod2)
```

```
## Analysis of Variance Table
##
## Response: IQ
## Df Sum Sq Mean Sq F value Pr(>F)
## height 1 67 67.2 0.65 0.44
## weight 1 0 0.1 0.00 0.98
## age 1 8 8.4 0.08 0.78
## male 1 118 118.3 1.15 0.31
## Residuals 10 1033 103.3
```

This illustrates that `anova()`

tests for linear models
are *sequential* tests. They test hypotheses about the extra
contribution of each variable over and above all previous ones, in a
given order. These usually do not make substantive sense, except in
testing ordered (“hierarchical”) models.