ACUTS: compilation of Ancient Conserved UnTranslated Sequences


Laurent DURET

E-mail:duret@biomserv.univ-lyon1.fr


Objectives

We are interested in indentifying new regulatory elements in untranslated regions of protein-coding genes (5'flanks, 5'UTRs, introns, 3'UTRs and 3'flanks). We focused our analyses on genes from metazoan species (essentially vertebrates, insects and nematods).

Approach

Our approach is based on comparative sequence analysis, for the identification of phylogenetic footprints.

"Analysis of non-coding regions [...] demonstrates that cis-acting regulatory elements with important functions are evolutionarily conserved. Even when such elements are of quite short length, they are distinguishable from the more rapidly evolving non-coding DNA that they are embeded in, provided the number of aligned homologous sequences represents enough evolutionary time for the accumulation of mutations at the less constrained (presumably selectively neutral) base positions. An evolutionary conserved element presents itself as a set of contiguous base positions that appear invariant among the aligned sequences." (Tagle et al. 1988).

By analogy with the DNase footprinting experiments, Tagle et al. proposed the term "phylogenetic footprinting" to describe the phylogenetic comparisons that reveal conserved cis-elements in the non-coding regions of homologous genes. For a good example of phylogenetic footprints, look at the beta-actin gene.

  • Material & Methods
  • We have extracted from GenBank 145 Mb of non-coding or non-annotated sequences. Sequences were compared between each other using a combination of BLAST (Altschul et al. 1990) and LFASTA (Pearson and Lipman 1988), to search for evolutionary conserved elements. Sequences were masked with the XBLAST program (Claverie and States 1993) to filter out microsatellites repeats. To ensure a high sensitivity, we used very low thresholds (S=90 for BLASTN searches). With such thresholds, similarity searches detect many non-significant matches. Result files were manually checked to keep only similarities that really reflect evolutionary conservation.

    Homologous untranslated sequences were then aligned with CLUSTALW (Thompson et al. 1994). Multiple alignments were manually edited with the SeaView program (Galtier et al. 1996).

    Results

    This study revealed the existence of hundreds of very long highly conserved regions (HCRs) in non-coding parts of genes: 70% identity or more over 50 to 2000 nt between species that diverged 300 to 550 million years ago (see example). The oldest detected HCR is conserved at least since the echinoderms/chordates divergence. We did not detect any clear sequence conservation between more distantly related species (nematods, insects). The longest conserved element covers nearly 2000 nt in the 3'UTR of delta-EF1 transcriptional repressor (see alignment). Such a conservation is unexpected because it concerns fragments that are much longer than the regulatory elements known to date.

    Studying HCRs distribution within genes showed that functional constraints are generally much stronger in 3'-non-coding regions than in promoters or introns. The 3'-HCRs are particularly A+T-rich and are always located in the transcribed untranslated regions of genes, which suggest that they are involved in post-transcriptional processes (mRNA export, localisation, translation, or degradation). Functional and structural analysis of HCRs is in under progress. The surprising result of this analysis is that there are very few conserved elements for which a function has been proposed. In most cases the function of conserved elements is absolutely not known.

    Information on HCRs (sequences, alignments, annotations, bibliographic references) are compiled in a database (ACUTS: compilation of Ancient Conserved UnTranslated Sequences). Currently 176 out of 326 detected HCRs have been analysed and incorporated in the database.

    You can also access:

    References

    If you use ACUTS in a published work, please cite the following references:
    Duret L., Bucher, P. (1997) Searching for regulatory elements in human noncoding sequences. Curr. Opin. Struct. Biol., 7, 399-406. [ Abstract ]

    Duret L., Dorkeld, F., Gautier, C. (1993) Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Res., 21, 2315-2322. [ Abstract ]


    Acknowledgments...
    If you have problems or comments...

    Back to PBIL home page