**Next message:**Mark Weber: "I am getting an error message "unhandled exception c0000005" when I attempt to cluster (compute part"**Previous message:**Daniel Chessel: "(no subject)"**In reply to:**Deconchat Marc: "TR: ponderation et co-inertie"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

I am responding to Stephanie in french - to be followed in english

The question is difficult enough and I will try to be simple.

X1 is an n-p variable file of standardized PCA

[this is a file which contains continuous environmental variables which

will be standardized by the PCA]

PCA = Principal Components Analysis.

PCA: Correlation matrix PCA produces output files, X1.cnta, X1.cnli,

X1.cnco, ...

X2 is a file of MCA [this is a file containing categorical variables,

coded 1-n (without zeros)]

MCA = Multiple Correspondence Analysis

MCA: Multiple Correspondence Analysis produces output files, X2.cmta,

X2.cmli, X2.cmco, ...

X1 and X2 are linked in a joint analysis (Hill & Smith).

MCA: Hill & Smith Analysis produces output files, X.hita, X.hili, X.hico,

...

We may then be tempted to do a CCA from the mixture of variables.

This is possible. But, there is no description of this operation because

it is mathematically possible as an incalculable number of possibilities

originating from the structure of ADE-4.

We then have a site-species file (Y)

COA: Correspondence Analysis on Y produces output files, Y.fcta, Y.fcli,

Y.fcco, ...

In order to ensure that everything is coherent:

1) execute PCA: Correlation matrix in using the weights given in file,

Y.fcpl

2) execute MCA: Multiple Correspondence Analysis in using the weights

given in file, Y.fcpl

3) execute MCA: Hill & Smith Analysis which keeps the joint weights in

file X.hipl

4) execute Projectors: Triplet->Orthonormal Basis on X.hita which gives an

orthonormal base

5) execute Projectors: PCA on Instrumental Variables on this base and

Y.fcta which will give results in the type of Z.ivfa, Z.ivl1, Z.ivco

Stephanie...... has arrived here and is wondering how to make a triplot of

species, sites, and enviro variables.

It will be sufficient to reason as follows:

******** SITES *******

A) The analysis produces linear combinations of starting variables with

variance 1.

They are in file Z.iv11 and produce the maps to these sites

(properties : means = 0, variances = 1, covariance=0, labels = species

labels ,weights = .hipl)

******** ESPECES *******SPECIES************

B) These scores have the property of maximizing the variances of the

species positions. (...Ter Braak, C.J.F. (1986) Canonical correspondence

analysis : a new eigenvector technique for multivariate direct gradient

analysis. Ecology : 67, 1167-1179.) The average position of species is in

the file Z.ivco

(properties : means = 0, variances = lambdak = max, covariance=0, labels =

species labels , weights = Y.fcpc)

******** VARIABLES DE X1 ******CONTINUOUS VARIABLES*************

C) To explain the correlation/tie/bond between scores and variables, you

must go back to your type of origin

For X1 : calculate the correlations between Z.hil1 and X1.cnli with

MatAlg: Diagonal Inner product C=X'DY with: X =X1.cnli

Option X = 2 or 3 (it is the same thing, variables are centered)

Y =Z.vil1

Option Y =1, 2 or 3 (it is the same thing, variables are standardized)

D inner product option 2 (weights in a file)

Weigth file = Y.fcpl (or X1.cnpl or X2.mcpl or X.hipl)

Output file =auxi1

The file auxi1 contains the correlations between the quantitative

variables and the (axes) scores.

******** VARIABLES DE X2 ******CATEGORICAL VARIABLES (X2)***********

D) To explain the correlation between the scores and the qualitative

variables, numerically use MCA: Correlation ratio - cmta with:

.cmta type file = X2.cmta

Row scoring = Z.hil1

Ouptut file name auxi2

auxi2 contains the correlation ratios between the qualitative variables

and the scores

(equivalent to the squared correlations with the quantitative variables)

[you cannot compute correlation with qualitative variables, only

correlation ratios].

We can also represent the correlations using category indicators

Run CategVar: Categ->Disj on X2.cat, and then MatAlg: Diagonal Inner

product C=X'DY between X201 and Z.hil1 as above.

We can also represent the means of scores for each category -

ScatterClass: Labels on Z.hil1 and X2.cat

REMARQUE : Why is this difficult?!:)

In this procedure, points A) and B), above, are justified; points C) and

D) are practical aids to clarify things by taking them apart. This comes

from the nature of CCA itself. The scores exist and are unique but the

starting variables contain some qualitatives, the matrix is not a full

rank and the coefficients of the linear combinations are not unique. This

is why we go from 72 columns (variables from X1 and dummy variables from

X2) to 60 columns (dimension of the created sub-space). All of the

analysis is based on the regressions of these 60 variables with the 285

sites: these numerical conditions are extremely dangerous/unstable? (but

ordinary). The coinertia analysis is here preferable and strongly

advised.

-From a mathematical point of view, Coniertia analysis gives better

results when the number of environmental variables is high. This is

because CCA, when you have many environmental variables, tends to a

plain CA.

The main problem with categorical variables is that each category acts

as a single variable, and this leads to a very high number of total

variables. You can reduce the number of categorical variables, or use

Coinertia analysis, which is quite simple (much more simple than CCA

in fact). Try to read the Coinertia analysis documentation (it is in

english).

Other - Re CCA on continuous and categorical environmental variables.

You have to :

1- use the COA module to perform a Correspondence Analysis of the

species table

2- separate the continuous and categorical variables in two files

3- analyse the continuous variables with the PCA module. Do not forget

to use the row weights computed in step 1

4- use the CategVar "Read categ file" option to read the categorical

variables

5- use the MCA "Multiple correspondance analysis" option on the

".cat" file obtained at step 4. Do not forget to use the row weights

computed in step 1

6- use the "Hill & Smith analysis" option of the MCA module to

link the categorical variables analysis (obtained at step 5) and

the continuous variables analysis (obtained at step 3)

7- use the CCA "Initialize explanatory variables" option on the

analysis obtained at step 6

8- use the CCA "CCA" option to perform the CCA.

Jean

**Next message:**Mark Weber: "I am getting an error message "unhandled exception c0000005" when I attempt to cluster (compute part"**Previous message:**Daniel Chessel: "(no subject)"**In reply to:**Deconchat Marc: "TR: ponderation et co-inertie"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*
This archive was generated by hypermail 2b30
: Sat Feb 10 2001 - 10:36:00 MET
*