The present directory contains the datasets used for the co-inertia analysis described by Thioulouse and Lobry (1995).

1. LIST OF FILES WITH SHORT DESCRIPTION
---------------------------------------

File Name Number of rows Number of columns Content

AANames 20 1 Amino-acid 3-letter code
ProtNames 999 1 Proteins names
IndiceNames 402 1 Indices names
IndiceVals 402 20 Indices values
ProtComp 999 20 Proteins composition


2. THE FILE AANames
-------------------

This file contains the names of the 20 amino-acids using the 3-letter
code, for instance gly for glycine. The amino-acids are sorted in
alphabetic-order.

3. THE FILE ProtNames
---------------------

This file contains the names of the 999 proteins. The name is in
two parts, for instance XYLEECOM.MALM. The prefix XYLEECOM refers
to the DNA contig from wich the coding sequence was obtained in
ECOSEQ6 collection (Rudd, 1993). The suffix MALM is the standard
name for the protein (Bachmann, 1990).

4. THE FILE IndiceNames
-----------------------

This file contains the names of the 402 physico-chemical properties
of amino-acids using the accession number of the AA-index collection
(Nakai et al., 1988) which is available and described at the URL:
ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex.

5. THE FILE IndiceVals
----------------------

This file contains in row the physico-chemical properties in the
same order than in the file IndiceNames and in columns the amino-
acids in the same order than in the file AANames. If i stands for
the row number and j for the column number, then the element (i,j)
of this table is the value of the index i for the amino-acid j.
These values are from the AA-index collection (Nakai et al., 1988)
described at the URL:ftp://ftp.genome.ad.jp/pub/db/genomenet/aaindex.

6. THE FILE ProtComp
--------------------

This file contains in row the proteins in the same order than in
the file ProtNames and in column the amino-acids in the same order
than in the file AANames. If i is the row number and j the column
number, then the element (i,j) in this table is the number of
amino-acids j in the protein i. These values were obtained from
a subset of ECOSEQ6 collection (Rudd, 1993). The criteria used
for the construction of this subset are described in Lobry and
Gautier (1994).

7. REFERENCES
-------------

Please quote Rudd (1993) if you use the protein composition dataset
and Nikai et al. (1988) if you use the amino-acid indices dataset.


Bachmann, B.J. (1990) Linkage map of Escherichia coli K-12, Edition 8.
Microbiological Reviews, 54 : 130-197.

Lobry, J.R. and Gautier, C. (1994) Hydrophobicity, expressivity and
araomaticity are the major trends of amino-acid usage in 999 Escherichia
coli chromosome-encoded genes. Nucleic Acids Research, 22 : 3174-3180.

Nakai, K., Kidera, A., Kanehisa, M. (1988) Cluster analysis of
amino acid indices for prediction of protein structure and
function. Protein Engineering, 2 : 93-100.

Rudd, K.E. (1993) Maps, genes, sequences, and computers: an Escherichia
coli case study. ASM news, 7 : 335-341.

Thioulouse, J. and Lobry, J.R. (1995) Co-inertia analysis of amino-acid
physico-chemical properties and protein composition with ADE package.
Computer Applications in Biosciences, 11 : 321-329.

7b. FORWARD BIBLIOGRAPHY
------------------------

The paper by Thioulouse & Lobry (1995) just above was cited by
the following papers:

Picard N, Gueguen K, Abdoulaye HA, Diarisso D, Karembe M, Birnbaum P, Nasi R
(2005) Tree formations in relation with soil and grasses in a dry savanna in
Mali, West Africa. African Journal of Ecology, 43:201-207.

Culhane, A.C., Perriere G., Higgins, D.G. (2003) Cross-platform comparison
and visualisation of gene expression data using co-inertia analysis. BMC
Bioinformatics, 4:59.

Perriere, G., Thioulouse, J. (2003) Use of correspondence discriminant
analysis to predict the subcellular location of bacterial proteins.
Comput. Meth. Prog. Bio., 70:99-105.

Lekve, K., Stenseth, N.C., Gjosaeter, J., et al. (2002) Species richness and
environmental conditions of fish along the Norwegian Skagerrak coast.
ICES J. Mar. Sci, 59:757-769.

Da Costa, K.S., Gourene, G., De Morais, L.T., et al. (2000) Fish populations
in two West-African coastal rivers facing different agricultural and
hydroelectric schemes. Vie Milieu, 50:65-77.

Reynaud, P.A., Thioulouse, J. (2000) Identification of birds as biological
markers along a neotropical urban-rural gradient (Cayenne, French Guiana),
using co-inertia analysis. J. Environ. Manage., 59:121-140.

Doledec, S., Statzner, B., Bournard, M. (1999) Species traits for future
biomonitoring across ecoregions: patterns along a human-impacted
river. Freshwater Biol., 42:737-758.

Truu, J., Talpsep, E., Heinaru, E., et al. (1999) Comparison of API 20NE
and Biolog GN identification systems assessed by techniques of multivariate
analyses. J. Microbiol. Meth., 36:193-201.

Townsend, C.R., Scarsbrook, M.R., Doledec, S. (1997) Quantifying disturbance
in streams: alternative measures of disturbance in relation to
macroinvertebrate species traits and species richness. J. N. Am. Benthol.
Soc., 16:531-544.

Statzner, B., Hoppenhaus, K., Arens, M.F., et al. (1997) Reproductive traits,
habitat use and templet theory: a synthesis of world-wide data on aquatic
insects. Freashwater Biol., 38:109-135.

Damborsky, J. (1997) Quantitative structure-function relationships of the
single-point mutants of haloalkane dehalogenase: A multivariate approach.
Quant. Struct.-Act. Rel., 16:126-135.

Thioulouse, J., Chessel, D., Doledec, S., et al. (1997) ADE-4: A multivariate
analysis and graphical display software. Stat. Comput., 7:75-83.

Bournaud, M., Cellot, B., Richoux, P., et al. (1996) Macroinvertebrate community
structure and environmental characteristics along a large river: Congruity of
patterns for identification to species or family. J. N. Am. Benthol. Soc.,
15:232-253.

Thioulouse, J. (1996) Towards better graphics for multivariate analysis: The
interactive factor map. Computation Stat., 11:11-21.

Perriere, G., Thioulouse, J. (1996) On-line tools for sequence retrieval and
multivariate statistics in molecular biology. Comput. Appl. Biosci., 12::63-69.

8. ADRESSES
-----------

For ECOSEQ6 collection : rudd@jorma.nlm.nih.gov
For AA-index collection : tomii@kuicr.kyoto-u.ac.jp
For Co-inertia analysis : Jean.Thioulouse@biomserv.univ-lyon1.fr
lobry@biomserv.univ-lyon1.fr

9. UPDATES
----------

15-JUL-2006 : the last (extra) tabulations in file IndiceVals were deleted.

14-JUL-2006 : Switch to the html site. Some minor typos in this README file fixed. Forward
bibliography (section 7b) updated.

19-JUN-2004 : Thanks to Daniel Chessel, a bug in the file ProtComp
was found: the columns were not in the lexical order
of the amino-acids, but in the same order as in the
file ftp://pbil.univ-lyon1.fr/pub/datasets/NAR94/data.txt
This is now fixed, and the file ProtComp in the present
directory is consistent with the file AANames.

Forward bibliography (section 7b in the README file) added.

25-SEP-1996 : Update bibliographical references

14-DEC-1994 : files put at the URL:
ftp://biom3.univ-lyon1.fr/pub/datasets/CABIOS95





If you have any problems or comments, please contact Jean Lobry.