INSA - TP bioinformatique




 

Exercise 1: Retrieve informations in sequence databases

Step 1:

Objective: retrieve information on human insulin.

Search the human insulin gene in EMBL with  WWW-Query at PBIL  ( *** ). Use the following criteria:


Retrieve the corresponding protein sequence in FASTA format ( *** ):

Go back two pages, and click on the link J00265.INS (note that this  CDS results of the splicing of the insulin gene) ( *** ), and then J00265 (the complete insulin gene) ( *** ).

Click on the link 'PUBMED; 503234' to get bibliographic references in PubMed ( *** ).

Click on the link /protein_id="AAA59172.1" to retrieve the corresponding entry in SwissProt ( *** ). Note the location of insulin B an A chains and peptide C.
 

Click on the link 'InterPro; IPR004825' to retrieve the corresponding entry in INTERPRO ( *** ).
 

Click on the link 'PROSITE; PS00262' to retrieve information on the insulin protein signature in PROSITE ( *** ).
 
  Step 2:

Objective: retrieve all mammalian insulin proteins.

Use WWW-Query at PBIL  to search for all mammalian insulin protein in Uniprot/SwissProt. Extract sequences in FASTA format.

NB: you can use the wildcard "*" to specify any series of characters to catch several keywords in one shot. For example: "*insulin" will catch "insulin, proinsulin, preproinsulin, etc.".

NB: take care not to include "insulin receptors", "insulin like growth factor", etc. in your selection.