To estimate codon usage bias in the species introduced in EMGLib, we used the well-known Codon Adaptation Index (CAI) defined by Sharp and Li (1987). The formula for computing CAI is:
where fi is the relative frequency of codon i in the coding sequence, and Wi the ratio of the frequency of codon i to the frequency of the major codon for the same amino-acid, as estimated from examining highly expressed genes in the species considered.
After identifying the factor(s) separating ribosomal protein genes from the others, codon frequencies of the genes with the highest score(s) on this (these) factors were used to build the CAI reference tables. Care was taken to use the same number of codons (#5000) from the leading and the lagging strand, this to avoid that our CAI values represent a strand index rather than an expressivity index. Indeed codon usage is known to be dependant on the strand on which the sequences are located.
| Genes | Table | Factor map | |
| B.burgdorferi | bb-list.txt | bb-cai.txt | bb-ca.gif |
| B.subtilis | bs-list.txt | bs-cai.txt | bs-ca.gif |
| E.coli | ec-list.txt | ec-cai.txt | ec-ca.gif |
| H.influenzae | hi-list.txt | hi-cai.txt | hi-ca.gif |
| H.pylori | hp-list.txt | hp-cai.txt | hp-ca.gif |
| M.genitalium | mg-list.txt | mg-cai.txt | mg-ca.gif |
| M.pneumoniae | mp-list.txt | mp-cai.txt | mp-ca.gif |
| M.tuberculosis | mt-list.txt | mt-cai.txt | mt-ca.gif |
On the CAI reference tables, the first column lists the amino acids by decreasing order of their number of synonymous codons (i.e., amino acids encoded by sextets are listed first, then the amino acids encoded by quartets, etc.) The second column lists the codons. The third column contains the value of ln(Wi ). The last column gives the absolute frequencies of codons in the data set. As some codons were not found in the genes of some species, we assigned a value of 0.5 to their frequency in a way to compute the value of ln(Wi ).
On the factor maps, the genes coding for ribosomal proteins are shown by red circles while the other genes are shown by yellow crosses. The factors used are given in each picture.