Seminaire Algorithmique et Biologie

Repeat Analysis on a Genomic Scale

Jens Stoye
Universität Bielefeld
Technische Fakultät
AG Genominformatik
Postfach 10 01 31
33501 - Bielefeld
ALLEMAGNE
E-Mail: stoye@TechFak.Uni-Bielefeld.DE

After the collection of huge amounts of biological sequence data by the various sequencing projects, one of the main bioinformatics tasks is the computational analysis of the raw data. Part of such an analysis is the search for various DNA patterns, including the search for sequence repeats. In the biological literature, various kinds of repetitive elements in DNA have been characterized, for example SINEs, LINEs, microsatellites.

In this talk we will present algorithmic methods for finding repetitive structures in whole genome sequences. We will discuss different repeat models: tandem repeats and tandem arrays, where the two or more copies of the repeat are immediately following each other; an extension of this, where (bounded) gaps between the repeats are allowed; and finally general repeats whose copies may contain a number of differences, so-called degenerate repeats. All of the methods presented have in common that for efficiency reasons they use the suffix tree, an index structure for strings that has proven to be very useful (not only) in the context of repeat finding.

References

D. Gusfield, J. Stoye. Linear Time Algorithms for Finding and Representing all the Tandem Repeats in a String. Department of Computer Science, UC Davis. Report CSE-98-4, 1998.
G. S. Brodal, R. Lyngsø, C. N. S. Pedersen, J. Stoye. Finding Maximal Pairs with Bounded Gap. J. Discr. Alg. 1, 77-104, 2000.
S. Kurtz, J. V. Choudhuri, E. Ohlebusch, J. Stoye, R. Giegerich. REPuter: the Manifold Applications of Repeat Analysis on a Genomic Scale. Nucleic Acids Res. 29(22), 4643-4653, 2001.
J. Stoye, D. Gusfield. Simple and Flexible Detection of Contiguous Repeats Using a Suffix Tree. Theor. Comput. Sci. 270(1-2), 843-856, 2002.

Back to the schedule

Repeat Analysis on a Genomic Scale

Jens Stoye Universität Bielefeld Technische Fakultät AG Genominformatik Postfach 10 01 31 33501 - Bielefeld ALLEMAGNE E-Mail: stoye@TechFak.Uni-Bielefeld.DE

Jens Stoye
Universität Bielefeld
Technische Fakultät
AG Genominformatik
Postfach 10 01 31
33501 - Bielefeld
ALLEMAGNE
E-Mail: stoye@TechFak.Uni-Bielefeld.DE