INSA - TP bioinformatique
Exercise 4 : Gene prediction
Objective: identify gene in human genomic sequences
We will use three different approaches to identify genes within a human
genomic sequence . Compare the results of each method.
First of all, it is necessary to identify and mask repeated sequences:
1- Methods ab initio
2- Search for transcribed regions (EST, cDNA) in genomic DNA
-
Search for ESTs matching the genomic sequence (masked): BLAST
(NCBI) (use database est_human) ( ***
)
-
Copy the first 20 ESTs, and retrieve sequences from NCBI BLAST output
at
PBIL ( *** )
-
Assemble ESTs with CAP3
( *** ***
) (save Contigs in a text file)
-
align the cDNAs and genomic DNA with SIM4
( *** ***
*** )
3- Comparative approach: search protein coding regions by similarity
-
Search for proteins matching the genomic sequence (masked) with BLASTX
(translated BLAST searches): BLAST
(NCBI)(use database swissprot)( *** )
-
Retrieve the closest homologue ( *** )
-
Alignment this protein to the genomic DNA with GENEWISE
(Sanger), or GENEWISE
(Pasteur) (parameter: genewise protein to genomic DNA)( ***
)
Now repeat the same exercise with two other human genomic sequences:
genomic1 genomic2
( résultats
)