Repeat Analysis on a Genomic Scale

Jens Stoye
Universität Bielefeld
Technische Fakultät
AG Genominformatik
Postfach 10 01 31
33501 - Bielefeld
ALLEMAGNE
E-Mail: stoye@TechFak.Uni-Bielefeld.DE

After the collection of huge amounts of biological sequence data by the various sequencing projects, one of the main bioinformatics tasks is the computational analysis of the raw data. Part of such an analysis is the search for various DNA patterns, including the search for sequence repeats. In the biological literature, various kinds of repetitive elements in DNA have been characterized, for example SINEs, LINEs, microsatellites.

In this talk we will present algorithmic methods for finding repetitive structures in whole genome sequences. We will discuss different repeat models: tandem repeats and tandem arrays, where the two or more copies of the repeat are immediately following each other; an extension of this, where (bounded) gaps between the repeats are allowed; and finally general repeats whose copies may contain a number of differences, so-called degenerate repeats. All of the methods presented have in common that for efficiency reasons they use the suffix tree, an index structure for strings that has proven to be very useful (not only) in the context of repeat finding.

References

Back to the schedule