Bioinformatics: Practicals
Exercise 4 : Sequence alignment for gene prediction
Objective: identify genes in genomic sequences
We want to identify genes within a chicken genomic sequence
and to determine the precise positions of their exons, introns, translation
initiation sites and translation termination sites. We will use two different
approaches, based on sequence comparison, to identify genes . You will have
to compare des results of the both methods, to try to understand the discrepancies
between the different predictions.
1- First of all, it is necessary to identify and mask repeated sequences
2- Search for transcribed regions (EST, cDNA) in genomic DNA
- Search for ESTs matching the genomic sequence (masked): MEGABLAST (NCBI) (use
database est_others) ( *** )
- Select all matching ESTs (at least 98% identity, > 70 bp) from
chicken (Gallus gallus), and retrieve sequences from NCBI BLAST output
( *** )
- Assemble ESTs with CAP3
( *** *** ) (save Contigs in a text file)
- Align each cDNA to the genomic DNA with SIM4 ( *** *** ***
)
3- Comparative approach: search protein coding regions by similarity
- Search for proteins matching the genomic sequence (masked) with BLASTX
(translated BLAST searches): BLAST (NCBI)( *** )
- Retrieve the closest homologue ( ***
)
- Alignment this protein to the genomic DNA with GENEWISE (Sanger), or GENEWISE (Pasteur)
(parameter: genewise protein to genomic DNA)( *** )
Now repeat the same exercise with two other chicken genomic sequences:
genomic1 genomic2
(see results )