Re: MGPCA & Discriminant Analysis

From: Anne B Dufour (dufour@biomserv.univ-lyon1.fr)
Date: Thu May 27 2004 - 09:12:13 MEST


Answer to Alexandra Lima via all adelisters

Alexandra Lima wrote "I would like to do a MGPCA - Multiple Group Principal
Components Analysis (Thorpe, 1988), as I'm interested in removing a "size
effect", but I'm not sure if this is possible with ADE.".
The question is not so simple and there are three levels.

First level: the PCA definition.
(A) Estimation of principal axes for a Gaussian distribution
(B) Geometric analysis of the shape of multivariate scatter plots.
This conceptual difference does not change the calculation and all the view
points are used with the same program (dudi.pca, prcomp, princomp, ..)
When the individuals belong to groups, the topic may be :
1) distinguish groups (discrimination)
2) ordinate simultaneously the groups.
In strategy A, this is the common principal components (CPCs) or partial
CPCs. These methods assume that either all components or only some of them
are common to all groups, the discrepancies being due mainly to sampling
error. (Airoldi, J.-P., and B. Flury. 1988. An application of common
principal component analysis to cranial morphometry of Microtus
californicus and M. ochrogaster (Mammalia, Rodentia). Journal of Zoology
216:21-36).
In strategy B, this is the multiple group principal component analysis or
MGPCA (Thorpe, 1983) which is also viewed as within-classes analyses.

The same question of simultaneous ordination of several groups has been
raised in morphometry, ecology, economy,...and we are in ade4 MGPCA.

Second level: different types of analyses linked to needs
Let X(i,j,k) be the value of the variable j for the individual i belonging
to the group k.
M(-,j,-) is the global mean, M(i,j,- ) is the within group mean.
The same notations are used for the standard deviations: S(-,j,-) and S(i,j,-)

We can compute
- a classical centered PCA: X(i,j,k)-M(-,j,-)
- a classical normed PCA: (X(i,j,k)-M(-,j,-))/S(-,j,-)
- a classical within PCA: X(i,j,k)-M(i,j,-)
- a normed within PCA: (X(i,j,k)-M(i,j,-))/S(-,j,-)
- a within group normalized PCA: (X(i,j,k)-M(i,j,-))/S(i,j,-)
- a Partial normed PCA with the variances of variables centering by groups:
(X(i,j,k)-M(i,j,-))/S*(-,j,-)

The within group normalized PCA is used for studies where the group
variances are very different between groups for a same variable.
The partial normed PCA is used for studies where the group variances are
very different between variables.

Third level: separating the size and the shape, removing the size effect
It is a complicated problem with such kinds of answers:
- removing the first factor,
- using residuals of regression,
- using a double centering on the logarithm values....

With ade4, first of all, you can compute
1) a normed PCA: >pca1 = dudi.pca(X)
2) a simple within analysis > wit1 = within (pca1)

For removing the size effect using the first factor, you can :
1) compute of a normed PCA > pca1 = dudi.pca(X)
2) remove the first factor every where > Y = apply(pca1$tab, 2, function(x)
residuals(lm(x~pca1$l1[,1])))
3) compute again a centered PCA on the table Y > pca2 = dudi.pca(Y,scale=F)
4) and compute a simple within analysis > wit2 = within (pca2)

Many other solutions may be applied. That depends of your own abilities.
And at last, it is far from a simple software problem !!

D. Chessel & A. Dufour



This archive was generated by hypermail 2b30 : Tue Sep 07 2004 - 13:30:56 MEST