You can modify these parameters in order to compute CA on other species, with other data banks or with different options for the graphics.
choosebank(bank = "emglib") req <- query("liste", "sp=mycoplasma genitalium et t=cds") seqs <- lapply(liste$req, getSequence)The first line selects the data bank in which we want to retrieve the sequences to be analysed (in this example, EMGLib). The complete list of the different banks that are accessible is available here. The second line performs a query allowing to retrieve the list of all the CDS from M. genitalium. Information on how to compose queries in order to retrieve sequences through the SeqinR interface is available here. At last, the third line retrieves the sequences themselves.
tabco <- lapply(seqs, uco) tabco <- as.data.frame(lapply(tabco, as.vector), row.names = names(tabco[])) names(tabco) <- liste$req ca <- dudi.coa(tabco, scan = F, nf = 3)Function
ucocalculates the codons counts for all the sequences stored in
seqs. Note that this function can also calculates relative frequencies (for more information, see the
ucodocumentation page). The variable
tabcocontaining the counts is then transformed into a data frame, in order to be used by ADE-4. The last line corresponds to the computation of CA itself. The options used mean that only the three first axes of the analysis have to be taken into consideration (see the
s.label(ca$co, clabel = 0, sub = "Genes F1xF2 map") s.label(ca$li, sub = "Codons F1xF2 map")Here, the factor maps crossing the two first axes for the genes (first line) and the codons (second line) are displayed. In the case of the codons map, labels giving the corresponding sequences have been added. We can see that the first axis separates the GC-ending codons from the AT-ending ones. Therefore, the trend represented by this axis is probably the GC-content of the genes.
Back to PBIL home page