Orphan gene finding - An exon
assembly approach
Philippe Blayo, Pierre Rouzé and Marie-France Sagot
Theoretical Computer Science, 290:1407-1431, 2003
This paper introduces an algorithm for finding eukaryotic genes. It
particularly addresses the problem of orphan genes, that is of genes
that can not, based on homology alone, be connected to any known gene
family and to which it is therefore not possible to apply traditional
gene finding methods. To the best of our knowledge, this is also the
first algorithm that attempts to compare in an exact way two DNA
sequences that contain both coding (i.e. exonic) and non coding
(i.e. intronic and, possibly, intergenic) parts. The
comparison is performed following an algorithmical model of a gene
that is as close as possible to the biological one (we consider in
this paper the "one ORF, one gene" problem only). A gene is seen as a
set of exons that are pieces of an assembly and are not independent.
The algorithm is efficient enough: although the constants are higher
than for usual sequence comparison, its time complexity is
proportional to the product of the sequences lengths while its space
complexity scales linearly with the length of the smallest sequence.
key words: orphan gene, gene finding, exon assembly, DNA/DNA
and DNA/protein comparison, coding DNA comparison models, dynamic
programming
Paper in postscript format
Back to the Publications page