Pattern inference under many
guises
Marie-France Sagot and Yoshiko Wakabayashi
in Recent
advances in algorithms and combinatorics
Springer Verlag,
2003
The paper surveys some of the main combinatorial methods for inferring
patterns from a string, or a set of strings. The types of problems
that will be addressed are repeat identification and common
pattern inference. The strings that will concern us represent
biological entities, nucleic acid and protein sequences or, in some
cases, structures. As is well-known, exact ("identical") patterns
hardly make sense in biology; we consider here two types of similar
("non-identical") patterns. One comes from looking at what "hides"
behind each letter of the DNA}/RNA or protein alphabet while the other
corresponds to the more familiar notion of "errors". The errors
concern mutational events that may affect a molecule during DNA
replication. Those that will be of interest to us are point mutations,
that is, mutations operating on single letters of a biological
sequence: substitution, insertion or deletion. Various notions of
"similarity" are discussed, as well as other constraints the patterns
may have to satisfy depending on the biological situation one is
facing. Algorithms for the various notions, together with complexity
analyses, are presented in some detail.
key words: pattern, inference, approximate, similarity,
biological sequence
Paper in postscript format
Back to the Publications page