Zagordi & Lobry (2005) Gene 347:175

This page allows for the on-line reproduction (and some extras) of the figures in the paper: Zagordi, O., Lobry, J.R. (2005) Forcing reversibility in the no strand-bias substitution model allows for the theoretical and practical identifiability of its 5 parameters from pairwise DNA sequence comparisons. Gene, 347:175-182 [DATASET] [arXiv] [preprint PDF]

Acknowledgements: This contribution partly comes from the thesis Osvaldo Zagordi presented at Naples University in October 2002. The authors thank warmly professor Luca Peliti for connecting them during the strapp 04 meeting (Dresden, Germany, July 5-10 2004). OZ also because he was introduced by him to the beauties of biological systems. They thank Manolo Gouy for kindly providing the multiple alignment of rRNA sequences and for many constructive suggestions. The manuscript was also improved thanks to the comments from three anonymous reviewers.
Abstract: Because of the base pairing rules in DNA, some mutations experienced by a portion of DNA during its evolution result in the same substitution, as we can only observe differences in coupled nucleotides. Then, in the absence of a bias between the two DNA strands, a model with at most 6 different parameters instead of 12 is sufficient to study the evolutionary relationship between homologous sequences derived from a common ancestor. On the other hand the same symmetry reduces the number of independent observations which can be made. Such a reduction can in some cases invalidate the calculation of the parameters. A compromise between biologically acceptable hypotheses and tractability is introduced and a five parameter reversible no-strand-bias condition (RNSB) is presented. The identifiability of the parameters under this model is shown by examples.

1. The no strand-bias condition

Figure 1 was to explain the no strand-bias condition: If the rates for a certain substitution are the same on both strands of DNA, one can deduce the equivalence of this rate to the one between the complementary bases.

2. The position of the new RNSB model

Figure 2 showed the position of the new RNSB within the hierarchy of already published DNA substitution models. Simplifications leading from a model to a simpler one are indicated by arrows. Only those directly referring to the discussion in the paper were drawn. This figure has been adapted from Robert Schmidt's work.

3. Data set: the observed difference matrix

The data set we used is the observed difference matrix between Homo and Xenopus in the multiple alignement of rRNA sequences from Gouy & Li (1989) Nature,339:145-147. The file of aligned sequences in mase format is here. The script is editable so that you can produce data for any available couple of species by changing the values of idx1 and idx2. The list of available species is:

[1]  "Homo"             "Mus"              "Xenopus"          "Drosophila"       
[5]  "Caenorhabditis"   "Oryza"            "Lycopersicon"     "Citrus"           
[9]  "Saccharomyces"    "Prorocentrum"     "Tetrahymena"      "Dictyostelium"    
[13] "Physarum"         "Crithidia"        "Methanococcus"    "Methanobacterium" 
[17] "Desulfurococcus"  "Sulfolobus"       "Thermoproteus"    "Thermoplasma"     
[21] "Halococcus"       "HalobacteriumH"   "HalobacteriumM"   "Escherichia"      
[25] "Pseudomonas"      "Rhodobacter"      "BacillusSubt"     "BacillusStea"     
[29] "Micrococcus"      "Streptomyces"     "Pirellula"        "Anacystis"        
[33] "EuglenaCP"        "Ruminobacter"     "Leptospira"       "Thermus"  

4. Individual influence of parameters near optimum

We used the chi-squared as an adjustement criterion and then used non-linear minimization to get an estimate of parameter values. Figure 3 showed the shape of the chi-squared criterion when parameter values are changed one at once around the minimum. Here are the parameter values you should obtain in the case of the Homo versus Xenopus comparison:

Minimum : 48.80783 at:
a  = 0.00174775040817750 
b  = 0.0112180666689986 
c  = 0.0155821726728541 
d  = 0.00486598075643191 
f  = 0.0163616450505670

5. Paiwise influence of parameters near optimum

We have explored systematically the shape of the Chi-squared criterion for all pairs of paramters around the minimum. In the paper we showed only (Figure 4) the case of the (b, c) pair because it was the unique instance for which a structural correlation was found. Here, all parameter pairs are given.

If you have any problems or comments, please contact Jean Lobry.