**Next message:**Daniel Chessel: "Levels : Evenly spaced values"**Previous message:**Jean Thioulouse: "Re: Limitations dans la taille des rapports d'analyse"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

QUICK AND SEMI-AUTOMATIC TRANSLATION OF D. CHESSEL' REPLY TO F. SPINAZZI.

I HOPE THIS CAN BE USEFUL TO ENGLISH READERS.

REGARDS.

########################################################################

*>1
*

*>I would like to make some observations on a species-centered PCA on a
*

*>site-species count table.
*

*>It could be possible that the first component is very often something like
*

*>a size component ?
*

*>I found a correlation beetwen the first component and 1/(Simpson's D)
*

*>and/or Shannon's H up to 0.7.
*

*>
*

This is the first and most important property of the ACP applied to a

faunistic array. Let be xij the abundance of the taxon j in the sample unit

i and mj the average abundance of the taxon in all sample units. The ACP

centered by taxon studies the quantity xij-mj. Some sample units may be

rich and others poor. From this moment the first factor is about a

size-effect. Two possible things:

*** The sample unit is experimentally standardized (1 m2 of ground, 15

minutes of listening, 100 liters of water,...) and it can deal with the

main information of the array. This is the case of limiting factors

(saltiness, dryness) and studies about pollution. In these experimental

cases it is necessary to run a PCA and to use the axis 1 as best indication

of wealth.

*** The sampling unit cannot be standardized (mix of several methods,

unfavorable or favorable meteorological conditions, sampling methods hard

to implement). There are large sample units and small ones but this is an

interfering information and it is not necessary to use the PCA centered by

taxon in this case. It is necessary first to eliminate the parasite by a

double centering (xij-mj-mi + m, ++ACP doubly centered, xij-aibj centering

double multiplicative, n* xij/ xi.x.j-1 double implicit centering of the

AFC).

The simplest reasoning is to tell that data = obviousness + structure +

error. What is obviousness? (example: there are large sample units and

small ones, there are rare species and common species). One seeks

structures by the analysis of the data-obviousness array. Papers by Austin,

Noy-mer et Orloci in the 60's-70's are very important about these

questions. They are surprisingly poorly known today. (Orloci, L. (1966)

Geometric models in ecology. I. The theory and applications of some

ordination methods. Journal of Ecology : 54, 193-215.

Austin, M.P. & Orloci, L. (1966) Geometric models in ecology II An

evaluation of some ordination techniques. Journal of Ecology : 54, 217-227,

see the doc of PCA: Non-centred PCA).

*>>2
*

*>>On frequencies tables, if they are espressed as %, every row will sum up
*

*>>to 100, for example.
*

*>>In such cases we have compositional data.
*

*>>It does not seem appropriate to use PCA on such a table because of the
*

*>>spurious correlation that could occur beetwen variables.
*

*>>We can use some trasformation, instead (centered logratio or so on).
*

*>>Do you agree ?
*

*>
*

Not exactly. Centering on logs (Aitchinson, J. (1983) Principal component

analysis of compositional data. Biometrika : 70, 57-65) is a problem which

comes from geology (especially granulometry and the famous clay-silt-sand

triangle and the composition of rocks). The quantity of studied matter is

not controlled and thus is converted into percentages. The artefactual

covariance due to S(pi)= 1 is important because there are few categories

and this generates curved clouds of points in space. On a big faunistic

array this fact is of minor importance and does not bring any problem. On

the other hand, choosing to convert into percentage is decisive. The

question is:

*** is a sample unit a frequency distribution between species?

*** is a species a frequency distribution between sample unit?

*** can one have the two points of view simultaneously?

The only method which meets the third criterion is the CA (Thioulouse, J. &

Chessel, D. (1992) A method for reciprocal scaling of species tolerance and

sample diversity. Ecology : 73, 670-680). Thus the CA is very particular

and should be used only from this point of view, because the price to pay

(arch effect) is high. This is shown by the fact that a PCA on % per sample

unit, a PCA on % per taxon and a CA lead to very different results. The

reference point (origin) in space is also very important. In the theory of

the niche, the model involved is model 2 (a species is a distribution of

frequencies between sample units, it has an average= optimum and a

variance= amplitude). The reference point is either the profile of all

species together, or the profile of ubiquitous species evenly distributed

in space. This generally generates very different centered PCAs or non

centered PCAs. Taking weights into account leads to even more possibilities

(see the documentation about Niche: Species Profile PCA , thematic form n°

4.7 Ecological niches and tables matching, and the documentation about non

symmetrical correspondence analysis COA: NSCA_ Row_ Profile and COA:

NSCA_ Col_ Profile). It is necessary therefore to choose a main goal:

*** typology of sample units (taxa are variables), for instance

phyto-ecological map, expert report on water quality (a sample unit is a

frequency distribution between species, species are used to order sample

units)--> PCA on sample units in %

*** typology of species (taxa are studied, sample units are experimental

ways: study of niches, competition, biological interactions)--> PCA on taxa

in %

*** The two of them --> COA (rare case!)

*>>3
*

*>>Sometimes it happens that a categorical scatterplot (Option|Elipses) 'does
*

*>>not work' after a Reciprocal Scaling.
*

*>>The module gives out the message 'no item in category ...'.
*

*>>When does such a situation occur ?
*

*>>I tried to undertand the matrix algebra behind Reciprocal Scaling but with
*

*>>not a great success.
*

*>
*

As a matter of fact the message takes place in the subroutine "compteindiv"

used to count individuals per mode. COA: Reciprocal scaling can operate

on empty lines or columns but the graphic program does not accept them.

This comes from the particular aspect of COA: Reciprocal scaling. This

option distributes all correspondences of the array (non null cells) by row

and by column and rewrites the array as values, row number, column number.

A null line generates a mode without item and makes the graphic program

crash. The next question shows that the problem has been well understood.

*>>4
*

*>>To me it seems meaningless to insert in a table sites with no species or
*

*>>viceversa.
*

*>>Infact I noticed that ADE often crashes when trying to perform PCA or COA
*

*>>on such a table.
*

*>>Otfen, but not always. What does it happens if some row or some columns
*

*>>sums to zero?
*

*>>How we can consider the row and column score in such situations ?
*

*>>Are they correct or the fact that program finished the job was only a
*

*>>realization of an improbable event?
*

*>
*

If one runs PCA: Correlation matrix PCA or PCA: Covariance matrix PCA on

DouPoi, there are no problems. The PCA tolerates rows of 0. If one

transposes the table DouPoi in A (species on rows) then Covariance matrix

PCA can be run (a column of 0 is accepted: the average and the variance are

null). On the contrary Correlation matrix PCA fails (there should be an

error message, sorry) because the division by the standard deviation is

impossible. COA: COrrespondence Analysis can be processed on the two

tables. The program is written to tolerate rows and columns of 0 with:

*> for (i=1;i<=i1;i++) {
*

*> a1 = poili[i];
*

*> if (a1 != 0.0) {
*

*> for (j=1;j<=j1;j++) {
*

*> a2 = poico[j];
*

*> if (a2 != 0) w[i][j] = w[i][j] / a1 / a2 - 1;
*

*> }
*

*> }
*

*> }
*

thus rows and columns of 0 will have null coordinates

*>*********************
*

Therefore centered PCA and COA on tables with rows or columns of 0 can be

run, each of them with their own logic. If this does not work, thank you

for feed back.

*>*********************
*

*>COA: Reciprocal scaling also runs with the two tables DouPoi and A. But
*

*>this is not a good thing because there is a mistake in files _ mvco or A_
*

*>mvli (value not allocated). What is surprising is that ScatterClass:
*

*>Ellipses which should crash runs! This is because there was a bug in
*

*>"compteindiv"!! Therefore the question was quite useful and has pointed
*

*>out a defect. Thanks! We are going to set this up. It will be logical to
*

*>block COA: Reciprocal scaling in case of rows or columns of 0, and things
*

*>will be clearer.
*

*>
*

*>>5
*

*>>Wich is the best method to rapresent units each of whom is a set of fish
*

*>>on wich we count parassites?
*

*>>Via MCA (how many fish have no parassites, how many fish...).
*

This is another question which remains to be deepened.

Best regards

Daniel Chessel

*>----------------------------------------------------------------
*

*>Universite Lyon 1 - Bat 401C - 69622 Villeurbanne CEDEX - France
*

*>Tel : 04 72 44 82 77 Fax : 04 72 43 11 41
*

*>----------------------------------------------------------------
*

*>ADE-4 sur Internet ---> http://biomserv.univ-lyon1.fr/ADE-4.html
*

*>----------------------------------------------------------------
*

**Next message:**Daniel Chessel: "Levels : Evenly spaced values"**Previous message:**Jean Thioulouse: "Re: Limitations dans la taille des rapports d'analyse"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*
This archive was generated by hypermail 2b30
: Sat Feb 10 2001 - 10:21:37 MET
*