CCA - D. Chessel's reply in english (& other Q&A's)

From: Stephanie Melles (stephajm@Interchange.ubc.ca)
Date: Wed Jun 23 1999 - 00:37:18 MET DST


I am responding to Stephanie in french - to be followed in english
The question is difficult enough and I will try to be simple.

X1 is an n-p variable file of standardized PCA
[this is a file which contains continuous environmental variables which
will be standardized by the PCA]
PCA = Principal Components Analysis.
PCA: Correlation matrix PCA produces output files, X1.cnta, X1.cnli,
X1.cnco, ...

X2 is a file of MCA [this is a file containing categorical variables,
coded 1-n (without zeros)]
MCA = Multiple Correspondence Analysis
MCA: Multiple Correspondence Analysis produces output files, X2.cmta,
X2.cmli, X2.cmco, ...

X1 and X2 are linked in a joint analysis (Hill & Smith).

MCA: Hill & Smith Analysis produces output files, X.hita, X.hili, X.hico,
...

We may then be tempted to do a CCA from the mixture of variables.

This is possible. But, there is no description of this operation because
it is mathematically possible as an incalculable number of possibilities
originating from the structure of ADE-4.

We then have a site-species file (Y)

COA: Correspondence Analysis on Y produces output files, Y.fcta, Y.fcli,
Y.fcco, ...

In order to ensure that everything is coherent:
1) execute PCA: Correlation matrix in using the weights given in file,
Y.fcpl
2) execute MCA: Multiple Correspondence Analysis in using the weights
given in file, Y.fcpl
3) execute MCA: Hill & Smith Analysis which keeps the joint weights in
file X.hipl
4) execute Projectors: Triplet->Orthonormal Basis on X.hita which gives an
orthonormal base
5) execute Projectors: PCA on Instrumental Variables on this base and
Y.fcta which will give results in the type of Z.ivfa, Z.ivl1, Z.ivco

Stephanie...... has arrived here and is wondering how to make a triplot of
species, sites, and enviro variables.

It will be sufficient to reason as follows:

******** SITES *******
A) The analysis produces linear combinations of starting variables with
variance 1.
They are in file Z.iv11 and produce the maps to these sites
(properties : means = 0, variances = 1, covariance=0, labels = species
labels ,weights = .hipl)

******** ESPECES *******SPECIES************
B) These scores have the property of maximizing the variances of the
species positions. (...Ter Braak, C.J.F. (1986) Canonical correspondence
analysis : a new eigenvector technique for multivariate direct gradient
analysis. Ecology : 67, 1167-1179.) The average position of species is in
the file Z.ivco
(properties : means = 0, variances = lambdak = max, covariance=0, labels =
species labels , weights = Y.fcpc)

******** VARIABLES DE X1 ******CONTINUOUS VARIABLES*************
C) To explain the correlation/tie/bond between scores and variables, you
must go back to your type of origin
For X1 : calculate the correlations between Z.hil1 and X1.cnli with
MatAlg: Diagonal Inner product C=X'DY with: X =X1.cnli
Option X = 2 or 3 (it is the same thing, variables are centered)
Y =Z.vil1
Option Y =1, 2 or 3 (it is the same thing, variables are standardized)
D inner product option 2 (weights in a file)
Weigth file = Y.fcpl (or X1.cnpl or X2.mcpl or X.hipl)
Output file =auxi1
The file auxi1 contains the correlations between the quantitative
variables and the (axes) scores.

******** VARIABLES DE X2 ******CATEGORICAL VARIABLES (X2)***********

D) To explain the correlation between the scores and the qualitative
variables, numerically use MCA: Correlation ratio - cmta with:
.cmta type file = X2.cmta
Row scoring = Z.hil1
Ouptut file name auxi2
auxi2 contains the correlation ratios between the qualitative variables
and the scores
(equivalent to the squared correlations with the quantitative variables)
 [you cannot compute correlation with qualitative variables, only
correlation ratios].

We can also represent the correlations using category indicators
Run CategVar: Categ->Disj on X2.cat, and then MatAlg: Diagonal Inner
product C=X'DY between X201 and Z.hil1 as above.

We can also represent the means of scores for each category -
ScatterClass: Labels on Z.hil1 and X2.cat

REMARQUE : Why is this difficult?!:)

In this procedure, points A) and B), above, are justified; points C) and
D) are practical aids to clarify things by taking them apart. This comes
from the nature of CCA itself. The scores exist and are unique but the
starting variables contain some qualitatives, the matrix is not a full
rank and the coefficients of the linear combinations are not unique. This
is why we go from 72 columns (variables from X1 and dummy variables from
X2) to 60 columns (dimension of the created sub-space). All of the
analysis is based on the regressions of these 60 variables with the 285
sites: these numerical conditions are extremely dangerous/unstable? (but
ordinary). The coinertia analysis is here preferable and strongly
advised.

-From a mathematical point of view, Coniertia analysis gives better
results when the number of environmental variables is high. This is
because CCA, when you have many environmental variables, tends to a
plain CA.

The main problem with categorical variables is that each category acts
as a single variable, and this leads to a very high number of total
variables. You can reduce the number of categorical variables, or use
Coinertia analysis, which is quite simple (much more simple than CCA
in fact). Try to read the Coinertia analysis documentation (it is in
english).

Other - Re CCA on continuous and categorical environmental variables.
You have to :

1- use the COA module to perform a Correspondence Analysis of the
species table

2- separate the continuous and categorical variables in two files

3- analyse the continuous variables with the PCA module. Do not forget
to use the row weights computed in step 1

4- use the CategVar "Read categ file" option to read the categorical
variables

5- use the MCA "Multiple correspondance analysis" option on the
".cat" file obtained at step 4. Do not forget to use the row weights
computed in step 1

6- use the "Hill & Smith analysis" option of the MCA module to
link the categorical variables analysis (obtained at step 5) and
the continuous variables analysis (obtained at step 3)

7- use the CCA "Initialize explanatory variables" option on the
analysis obtained at step 6

8- use the CCA "CCA" option to perform the CCA.

Jean



This archive was generated by hypermail 2b30 : Sat Feb 10 2001 - 10:36:00 MET