Objective: identify genes in genomic sequences
We want to identify genes within a chicken genomic sequence and to determine the precise positions of their exons, introns, translation initiation sites and translation termination sites. We will use three different approaches to identify genes . You will have to compare des results of the three methods, to try to understand the discrepancies between the different predictions, and to decide which one is the most reliable.
First of all, it is necessary to identify and mask repeated sequences:
Exon 1: 3862..3938 Exon 2: 5197..5410 (start codon : 5225) Exon 3: 7995..8113 Exon 4: 10358..10612 Exon 5: 13027..14423 (stop codon : 13141); poly A signal (AATAAA): 4402..14407NB: intron 2 (5411..7994): non-canonical splice donnor site; imperfect EST/DNA alignment. Error in genomic sequence ? NB: the assembly of ESTs gives 2 contigs: one corresponds to the gene, the other one corresponds to an antisens transcript. This antisens RNA does not seem to contain any significant ORF and does not show homology to any known protein. genomic2: One gene : homologous to Chic 1 (cysteine-rich hydrophobic domain 1)
Exon 1 <2829..3016 (start codon : 2829) Exon 2 10389..10443 Exon 3 16885..17040 Exon 4 19041..19097 Exon 5 19329..19388 Exon 6 20129..>22021 (stop codon : 20176)Signs "<" and ">" indicate that the 5' and 3' ends of the transcription unit have not been identified. NB: the assembly of ESTs gives 2 contigs corresponding to 2 fragments of the mRNA. One contig overlaps exons 1 to 6 (it is incomplete in 5': it does not include the start codon; it is incomplete in 3': it includes the stop codon, but does not contain the polyA tail). The other contig corresponds to a fragment of the 3'UTR (exon 6).