### Algorithms for extracting
structured motifs using a suffix tree with an application to promoter
and regulatory site consensus identification

####
**Laurent Marsan and Marie-France Sagot**

*Journal of Computational Biology*, 7:345-360, 2000

The paper introduces two exact algorithms for extracting conserved
structured motifs from a set of DNA sequences. Structured motifs may
be described as an ordered collection of *p* >= 1 "boxes" (each
box corresponding to one part of the structured motif), *p*
substitution rates (one for each box) and *p-1* intervals of
distance (one for each pair of successive boxes in the
collection). The contents of the boxes - that is, the motifs
themselves -- are unknown at the start of the algorithm. This is
precisely what the algorithms are meant to find. A suffix tree is used
for finding such motifs. The algorithms are efficient enough to be
able to infer site consensi, such as, for instance, promoter sequences
or regulatory sites, from a set of unaligned sequences corresponding
to the non coding regions upstream from all genes of a genome. In
particular, both algorithms time complexity scales linearly with
*N*^2 ** n* where *n* is the average length of the
sequences and *N* their number. An application to the
identification of promoter and regulatory consensus sequences in
bacterial genomes is shown.

**key words:** structured motif extraction, promoter and regulatory
site, consensus, model, suffix tree

Paper in postscript format

Back to the *Publications* page