Flexible identification of structural objects in nucleic acid sequences: palindromes, mirror repeats, pseudoknots and triple helices


Marie-France Sagot and Alain Viari
in Combinatorial Pattern Matching'97, Aarhus, Danemark
Lecture Notes in Computer Science, vol. 1264, pages 224-246, Springer Verlag, 1997

The paper presents algorithms for flexibly identifying structural objects in nucleic acid sequences. These objects are palindromes, mirror repeats, pseudoknots and triple helices. We further explore here the idea of a model against which the words in a sequence are compared for finding these structural objects (Sagot, 1995b). In the present case, models are words defined over the alphabet of nucleotides that have both direct and inverse occurrences in the sequence. Moreover, errors (substitutions, deletions and insertions) are allowed between a model and its inverse occurrences. Helix stems may therefore present bulges or interior loops, and mirror repeats need not be exact. Reasonably efficient performance comes from the fact that the parts composing the structures are kept separated until the end and that filtering for valid occurrences (occurrences that may form part of such a structure) can be done in O(n) time where n is the length of the sequence. The time complexity for the searching phase (that is, before the structural parts are put together at the end) of both algorithms presented here (one for palindromes and mirror repeats, the other for pseudoknots and triple helices) is then O(n * k * (e+1) (1+min{d_max-d_min+1+e,k^e |Sigma|^e})) where n is the length of the sequence, d_max and d_min are, respectively, the maximal and minimal length of a hairpin loop, k is either the maximum length k_max of a model, is a fixed length or represents the maximum value of a range of lengths, e is the maximum number of errors allowed (substitutions, deletions and insertions) and |Sigma| is the size of the alphabet of nucleotides.

key words: nucleic acid sequence, nucleic structural object, palindrome, mirror repeat, pseudoknot, triple helix, approximate comparison, model, direct occurrence, (complementary) inverse occurrence

Paper in postscript format
Back to the Publications page