Search for sequences
and Nucleotide databank
, then
select EMBL
in the scrolling list. To compose your query, use the
following criteria:
CriteriaDEFAULT Keyword insulin AND Species vertebrata AND Type CDS AND NOT Keyword partial List name ins
AND Type = CDS
allows to select only the protein coding regions
and criteria AND NOT Type = partial
allows to exclude incomplete
sequences.
Note the number of sequences you retrieved and now, go back to the WWW-Query page and perform the following query:
What do you notice? Are all sequences coding for insulin? What are your conclusions about the use of keywords in general sequence databases?DEFAULT Keyword @insulin@ AND Species vertebrata AND Type CDS AND NOT Keyword partial List name ins2
Click on the HTML link allowing to access the coding sequence of the gene (***). By looking at the features (lineDEFAULT Name HSINS01 AND Type CDS List name ins3
CDS_pept
)
what can you conclude about the structure of the insulin gene?
Click on the name of the mother sequence (HSINS01
) in a way to
access it (***). Then click on the link
MEDLINE; 82221404
to get the corresponding bibliographic reference
in Medline (***).
Go back to the mother sequence and click on the link /db_xref="SWISS-PROT:P01308
to retrieve the corresponding entry in SWISS-PROT (***).
In the SWISS-PROT annotations, find the features table and note the location of insulin
B an A chains and peptide C.
Click on the link PRODOM; P01308
to retrieve information on the domain
structure of insulin in PRODOM. Once the graphical page is loaded, click on the blue
box containing a red dot at the left of the page: you will access the list of all
proteins sharing at leat one homologous domain with the human insulin. Are all proteins
insulins or not? Note that, if you click on the box containing the picture corresponding
to the insulin domain (), you will access the
alignment of this domain.
Go back to the SWISS-PROT entry and click on the link PROSITE; PS00262
to
retrieve information on the insulin protein signature in PROSITE (***).
Click on the link PDOC00235
to see the textual description of insulin
signature.
HSINS01.INS
sequence, click on the Retrieve
button.
The viewed page allows you to retrieve all sequences stored in a list.
We need to get the insulin protein sequence in Fasta format, so do the
following selections:
Once the sequence appears in the window of your Web browser (***), save it on your computer in text format.Sequence: proteins Format: Fasta Mode: direct sending List name: ins3