Searching for repeated words in a text allowing for mismatches and gaps

Marie-France Sagot, Vincent Escalier, Alain Viari and Henri Soldano
in Second South American Workshop on String Processing, Viņas del Mar, Chile
pages 87-100, Proceedings University of Chile, 1995

We present in this paper an algorithm that locates similar words common to a set of strings defined over an alphabet Sigma, where the similarity is stated in terms of a Levenshtein edit distance. The comparison of the words in the strings is realized by using a reference object called a model which is a word over Sigma. This allows us to perform a multiple comparison of the strings as opposed to pairwise comparisons, and the algorithm is particularly appropriate for the analysis of DNA/RNA sequences

key words: multiple comparison, Levenshtein edit distance, model

