PBIL

Principal Coordinates Analysis

BBE contribution to PBIL in Lyon, France


Aim of the method

Principal COordinates analysis (PCO) is a method for building sequence ordinations using sets of aligned sequences. These ordinations can be used to complement phylogenetic analyses, especially when are large number of sequences is considered. PCO computations on the PBIL server are realized through a Web implementation of R called Rweb. This page describes (with an example) the different steps required to do this kind of analysis on a set of sequences selected from the server.

Example

In order to perform a PCO on a set of aligned sequences, you need to transfer these sequences to our server thanks to one of the three different functionalities listed below. Note that the alignment must be in Clustal format. An example set containing sequences from the Hobacgen database is available here.

Either way, the dataset will be read in using read.alignment with format="clustal" and stored in a dataframe called aln. Temporary files are stored here. Once the alignment has been pasted or selected, you can start the PCO computation. Below is the R code allowing to compute the analysis on the data submitted.

Paste aligned sequences here:

Or enter a dataset URL:

Or select a local file to submit:


You can modify these parameters in order to compute PCO with different options for the graphics.

R code explanations

Analysis computation

First, the alignment will be transferred in a temporary file on our server. The R variable in which it is stored is named aln. Therefore, do not change it or you will not be able to compute your analysis.
    mat <- dist.alignment(aln, matrix = "similarity")
Starting from the alignment in aln, a distance matrix is built. Two options are available for computing the distances: matrix = "identity" or matrix = "similarity". The first option can be used either with nucleotide or protein sequences. It simply counts the number of differences between each sequences in the alignment to compute the distances. The second option can be used only with protein sequences alignments. It uses the Fitch (1966) distance matrix between amino acids. This matrix is based on the number of mutations required to change an amino-acid into another one.

The second line transforms the distance matrix computed from the alignment into an euclidean matrix as PCO can only be computed on such kind of data:

    dst <- lingoes(mat)

Next, the categories for the organisms considered are defined. Note that this line is specific of the alignment used in this example, therefore it must be removed or edited when using your own data.

    cat <- as.factor(c(1,1,1,1,2,3,3,3,3,3,3,3,4,1,1,1,1,1,1,1,1,3,3,3,3,5,5,5,3))
The categories defined here correspond to the taxonomic groups for the species in which the sequences has been obtained. 1: Proteobacteria, 2: Deinococcus/Thermus , 3: Gram positives bacteria, 4: Cyanobacteria, 5: Yeast.

The next line corresponds to the computation of PCO itself. The options used mean that only the three first axes of the analysis have to be taken into consideration (see the dudi.pco documentation page for more information):

    pco <- dudi.pco(dst, scan = F, nf = 3)

Graphics plotting

In order to visualize the results, the following line allows to plot the factor map crossing the two first axes of the analysis: a.
    s.label(pco$li, sub = "F1xF2 map")
The last command line is also specific to the example and must be removed or edited when submitting your own data. It allows to draw the hulls gathering the species belonging to the same taxonomic group. The colors used correspond to the five groups previously defined:
    s.chull(pco$li, cat, optchull=1, add.plot=TRUE, col=c("red","black","green","purple","blue"))
Here, we can see that a group of four enterobacteria (Salmonella enterica, Salmonella typhimurium, Escherichia coli and Yersinia pestis) is separated from the other Proteobacteria. This phenomenon could be explained either by a high evolutionary rate for this gene in the clade of Enterobacteria or by an horizontal transfer in the common ancestor of the four species considered.


If you have problems or comments...

Back to PBIL home page