This page allows for the on-line reproduction (and some extras)
of the figures in the paper:
Zagordi, O.,
Lobry, J.R.
(2005)
Forcing reversibility in the no strand-bias substitution model
allows for the theoretical and practical identifiability of its
5 parameters from pairwise DNA sequence comparisons.
*Gene*,
**347**:175-182
[DATASET]
[arXiv]
[preprint PDF]

Acknowledgements:This contribution partly comes from the thesis Osvaldo Zagordi presented at Naples University in October 2002. The authors thankwarmlyprofessor Luca Peliti for connecting them during the strapp 04 meeting (Dresden, Germany, July 5-10 2004). OZ also because he was introduced by him to the beauties of biological systems. They thank Manolo Gouy for kindly providing the multiple alignment of rRNA sequences and for many constructive suggestions. The manuscript was also improved thanks to the comments from three anonymous reviewers.

Abstract:Because of the base pairing rules in DNA, some mutations experienced by a portion of DNA during its evolution result in the same substitution, as we can only observe differences in coupled nucleotides. Then, in the absence of a bias between the two DNA strands, a model with at most 6 different parameters instead of 12 is sufficient to study the evolutionary relationship between homologous sequences derived from a common ancestor. On the other hand the same symmetry reduces the number of independent observations which can be made. Such a reduction can in some cases invalidate the calculation of the parameters. A compromise between biologically acceptable hypotheses and tractability is introduced and a five parameterreversible no-strand-bias condition(RNSB) is presented. The identifiability of the parameters under this model is shown by examples.

Figure 1 was to explain the *no strand-bias condition*:
If the rates for a certain substitution are the same on
both strands of DNA, one can deduce the equivalence of this
rate to the one between the complementary bases.

Figure 2 showed the position of the new RNSB within the hierarchy of already published DNA substitution models. Simplifications leading from a model to a simpler one are indicated by arrows. Only those directly referring to the discussion in the paper were drawn. This figure has been adapted from Robert Schmidt's work.

The data set we used is the observed difference matrix
between Homo and Xenopus in the multiple alignement of
rRNA sequences from
Gouy & Li (1989) *Nature*,**339**:145-147.
The file of aligned sequences in mase format is
here.
The script is editable so that you can produce data for any
available couple of species by changing the values of `idx1`
and `idx2`. The list of available species is:

[1] "Homo" "Mus" "Xenopus" "Drosophila" [5] "Caenorhabditis" "Oryza" "Lycopersicon" "Citrus" [9] "Saccharomyces" "Prorocentrum" "Tetrahymena" "Dictyostelium" [13] "Physarum" "Crithidia" "Methanococcus" "Methanobacterium" [17] "Desulfurococcus" "Sulfolobus" "Thermoproteus" "Thermoplasma" [21] "Halococcus" "HalobacteriumH" "HalobacteriumM" "Escherichia" [25] "Pseudomonas" "Rhodobacter" "BacillusSubt" "BacillusStea" [29] "Micrococcus" "Streptomyces" "Pirellula" "Anacystis" [33] "EuglenaCP" "Ruminobacter" "Leptospira" "Thermus"

We used the chi-squared as an adjustement criterion and then used non-linear minimization to get an estimate of parameter values. Figure 3 showed the shape of the chi-squared criterion when parameter values are changed one at once around the minimum. Here are the parameter values you should obtain in the case of the Homo versus Xenopus comparison:

Minimum : 48.80783 at: a = 0.00174775040817750 b = 0.0112180666689986 c = 0.0155821726728541 d = 0.00486598075643191 f = 0.0163616450505670

We have explored systematically the shape of the Chi-squared criterion for all pairs of paramters around the minimum. In the paper we showed only (Figure 4) the case of the (b, c) pair because it was the unique instance for which a structural correlation was found. Here, all parameter pairs are given.