TP bioinformatique
Exercise 4 : Gene prediction
Objective: identify genes in genomic sequences
We will use three different approaches to identify genes within a chicken
genomic sequence . Compare the results of each method.
First of all, it is necessary to identify and mask repeated sequences:
1- Methods ab initio
2- Search for transcribed regions (EST, cDNA) in genomic DNA
- Search for ESTs matching the genomic sequence (masked): megablast (NCBI) (use
database est_others) ( *** )
- Select all matching ESTs (at least 95-98% identity, > 70 bp)
from chicken (Gallus gallus), and retrieve sequences from NCBI BLAST output
( *** )
- Assemble ESTs with CAP3 ( *** *** ) (save Contigs in a text file)
- Align each cDNA to the genomic DNA with SIM4 ( *** *** ***
)
3- Comparative approach: search protein coding regions by similarity
- Search for proteins matching the genomic sequence (masked) with
BLASTX (translated BLAST searches): BLAST (NCBI)( *** )
- Retrieve the closest homologue ( *** )
- Alignment this protein to the genomic DNA with GENEWISE (Sanger), or GENEWISE (Pasteur)
(parameter: genewise protein to genomic DNA)( *** )
Now repeat the same exercise with two other chicken genomic sequences:
genomic1 genomic2
(see results )