Taller de Bioinformática - Clases practicas
16-19 de Octubre 2001, Santiago, Chile
Exercise 4 : Gene prediction
Objective: identify gene in human genomic sequences
We will use three different approaches to identify genes within a human
genomic sequence . Compare the results of each method.
First of all, it is necesary to identify and mask repeated sequences:
1- Methods ab initio
2- Search for transcribed regions (EST, cDNA) in genomic DNA
-
Search for ESTs matching the genomic sequence (masked): BLAST
(NCBI) (use database est_human) ( ***
)
-
Copy the first 20 ESTs, and retrieve sequences from NCBI BLAST output
at PBIL
( *** )
-
Assemble ESTs with CAP3
( *** ***
) (save Assembly in a text file)
-
align the cDNAs and genomic DNA with SIM4
( *** ***
*** )
3- Comparative approach: search protein coding regions by similarity
-
Search for proteins matching the genomic sequence (masked) with BLASTX
(translated BLAST searches): BLAST
(NCBI)(use database swissprot)( *** )
-
Retrieve the closest homologue ( *** )
-
Alignment this protein to the genomic DNA with GENEWISE
(Pasteur) (parameter: genewise protein to genomic DNA)( ***
)
5- Now repeat the same exercise with two other human genomic sequences:
genomic1 genomic2