Extracting structured motifs using a suffix tree - Algorithms and application to promoter consensus identification

Laurent Marsan and Marie-France Sagot
in Proceedings of the Fourth Annual International Conference on Computational Molecular Biology,
Tokyo, Japan, pages 210-219, ACM Press, 2000

This paper introduces two exact algorithms for extracting conserved structured motifs from a set of DNA sequences. Structured motifs are composed of p >= 2 parts separated by constrained spacers. These algorithms use a suffix tree for fulfilling this task. They are efficient enough to be able to extract site consensus, such as promoter sequences, from a whole collection of non coding sequences extracted from a genome. In particular, their time complexity scales linearly with N^2 * n where $n$ is the average length of the sequences and N their number. An application with interesting results to the identification of promoter consensus sequences in bacterial genomes is shown.

key words: motif extraction, structured motif, promoter, consensus, model, suffix tree

