TP bioinformatique




 

Results of Exercise 4 (Gene prediction)


Objective: identify genes in genomic sequences

We want to identify genes within a chicken genomic sequence and to determine the precise positions of their exons, introns, translation initiation sites and translation termination sites. We will use three different approaches to identify genes . You will have to compare des results of the three methods, to try to understand the discrepancies between the different predictions, and to decide which one is the most reliable.

First of all, it is necessary to identify and mask repeated sequences:

1- Methods ab initio

2- Search for transcribed regions (EST, cDNA) in genomic DNA

3- Comparative approach: search protein coding regions by similarity


Now repeat the same exercise with two other chicken genomic sequences:

genomic1:

One gene : homologous to Thioredoxin domain-containing protein 9 (ATP-binding protein associated with cell differentiation)



Exon 1: 3862..3938
Exon 2: 5197..5410   (start codon : 5225)
Exon 3: 7995..8113   
Exon 4: 10358..10612   
Exon 5: 13027..14423  (stop codon : 13141); poly A signal (AATAAA): 4402..14407
NB: intron 2 (5411..7994): non-canonical splice donnor site; imperfect EST/DNA alignment. Error in genomic sequence ?

NB: the assembly of ESTs gives 2 contigs: one corresponds to the gene, the other one corresponds to an antisens transcript. This antisens RNA does not seem to contain any significant ORF and does not show homology to any known protein.

genomic2:

One gene : homologous to Chic 1 (cysteine-rich hydrophobic domain 1)



  Exon 1   <2829..3016    (start codon : 2829)
  Exon 2   10389..10443 
  Exon 3   16885..17040 
  Exon 4   19041..19097 
  Exon 5   19329..19388 
  Exon 6   20129..>22021 (stop codon : 20176)
Signs "<" and ">" indicate that the 5' and 3' ends of the transcription unit have not been identified.

NB: the assembly of ESTs gives 2 contigs corresponding to 2 fragments of the mRNA. One contig overlaps exons 1 to 6 (it is incomplete in 5': it does not include the start codon; it is incomplete in 3': it includes the stop codon, but does not contain the polyA tail). The other contig corresponds to a fragment of the 3'UTR (exon 6).