Flexible identification of
structural objects in nucleic acid sequences: palindromes, mirror
repeats, pseudoknots and triple helices
Marie-France Sagot and Alain Viari
in Combinatorial Pattern
Matching'97, Aarhus, Danemark
Lecture Notes in Computer Science,
vol. 1264, pages 224-246, Springer Verlag, 1997
The paper presents algorithms for flexibly identifying structural
objects in nucleic acid sequences. These objects are palindromes,
mirror repeats, pseudoknots and triple helices. We further explore
here the idea of a model against which the words in a sequence are
compared for finding these structural objects (Sagot, 1995b). In the
present case, models are words defined over the alphabet of
nucleotides that have both direct and inverse occurrences in the
sequence. Moreover, errors (substitutions, deletions and insertions)
are allowed between a model and its inverse occurrences. Helix stems
may therefore present bulges or interior loops, and mirror repeats
need not be exact. Reasonably efficient performance comes from the
fact that the parts composing the structures are kept separated until
the end and that filtering for valid occurrences (occurrences that may
form part of such a structure) can be done in O(n) time where
n is the length of the sequence. The time complexity for the
searching phase (that is, before the structural parts are put together
at the end) of both algorithms presented here (one for palindromes and
mirror repeats, the other for pseudoknots and triple helices) is then
O(n * k * (e+1) (1+min{d_max-d_min+1+e,k^e |Sigma|^e}))
where n is the length of the sequence, d_max and
d_min are, respectively, the maximal and minimal length of a
hairpin loop, k is either the maximum length k_max of a
model, is a fixed length or represents the maximum value of a range of
lengths, e is the maximum number of errors allowed
(substitutions, deletions and insertions) and |Sigma| is the
size of the alphabet of nucleotides.
key words: nucleic acid sequence, nucleic structural object,
palindrome, mirror repeat, pseudoknot, triple
helix, approximate comparison, model, direct
occurrence, (complementary) inverse occurrence
Paper in postscript format
Back to the Publications page