Principal Coordinates Analysis
Either way, the dataset will be read in using
format="clustal" and stored in a dataframe called
aln. Temporary files are stored
Once the alignment has been pasted or selected, you can start the PCO
computation. Below is the R code allowing to compute the analysis on the
You can modify these parameters in order to compute PCO with different options for the graphics.
aln. Therefore, do not change it or you will not be able to compute your analysis.
mat <- dist.alignment(aln, matrix = "similarity")Starting from the alignment in
aln, a distance matrix is built. Two options are available for computing the distances:
matrix = "identity"or
matrix = "similarity". The first option can be used either with nucleotide or protein sequences. It simply counts the number of differences between each sequences in the alignment to compute the distances. The second option can be used only with protein sequences alignments. It uses the Fitch (1966) distance matrix between amino acids. This matrix is based on the number of mutations required to change an amino-acid into another one.
The second line transforms the distance matrix computed from the alignment into an euclidean matrix as PCO can only be computed on such kind of data:
dst <- lingoes(mat)
Next, the categories for the organisms considered are defined. Note that this line is specific of the alignment used in this example, therefore it must be removed or edited when using your own data.
cat <- as.factor(c(1,1,1,1,2,3,3,3,3,3,3,3,4,1,1,1,1,1,1,1,1,3,3,3,3,5,5,5,3))The categories defined here correspond to the taxonomic groups for the species in which the sequences has been obtained. 1: Proteobacteria, 2: Deinococcus/Thermus , 3: Gram positives bacteria, 4: Cyanobacteria, 5: Yeast.
The next line corresponds to the computation of PCO itself. The options used
mean that only the three first axes of the analysis have to be taken into
consideration (see the
documentation page for more information):
pco <- dudi.pco(dst, scan = F, nf = 3)
s.label(pco$li, sub = "F1xF2 map")The last command line is also specific to the example and must be removed or edited when submitting your own data. It allows to draw the hulls gathering the species belonging to the same taxonomic group. The colors used correspond to the five groups previously defined:
s.chull(pco$li, cat, optchull=1, add.plot=TRUE, col=c("red","black","green","purple","blue"))Here, we can see that a group of four enterobacteria (Salmonella enterica, Salmonella typhimurium, Escherichia coli and Yersinia pestis) is separated from the other Proteobacteria. This phenomenon could be explained either by a high evolutionary rate for this gene in the clade of Enterobacteria or by an horizontal transfer in the common ancestor of the four species considered.
Back to PBIL home page