Objective: retrieve homologues of human insulin
in the genome of the nematode C. elegans, using profiles.
Profile searches are based on 3 steps:
1- Compare your query sequence to databases using a basic similarity search
tool (e.g. BLAST), to identify a first set of homologues
2- Align these homologues (e.g. with clustal), and build a profile
3- Compare this profile to databases to identify more distantly related
homologues
Steps 2 and 3 can be iterated as long as new homologues are identified.
Step 1: Retrieve insulin homologues with BLASTP. This time we will use another web server: BLAST at PBIL. Parameters: Description: 10000 Alignment: 10000
Step 2: To exclude partial and artificial insulin sequences, filter BLAST output with the following parameters (button "Filter by taxon/keyword/date) ( *** ):
Step 3: Select sequences to be included in the profile. Given the
size of the insulin family, we will only select a subset of about 20-30
insulin homologues. It is important to include in the profile homologues
that are distantly related (i.e. to maximise
the sequence diversity in the ). To maximise the sequence diversity in the profile).
However, take care to only select significant matches (E-value < 0.05)
(***)
Step 4: Align sequences on your local computer with clustalx
(default parameters) (***)
Step 5: Look at the alignement with seaview. Select the
conserved regions (menu "Sites" in seaview), and save these regions
in MSF format. -> insfam.msf (***)
Step 6: compute de profile (***)
pfmake -1 insfam.msf $BLOSUM62 > ins.prf
Step 7 (optional): Retrieve all nematode proteins in SwissProt-TrEMBL
using WWW-Query
at PBIL, and then extract sequences in FASTA format.
NB: this file has already been prepared for you: $CELPEP
Step 8: compare the profile to the database of C. elegans proteins (CELPEP):
pfsearch -af insfam.prf $CELPEP | sort -nr > insfam.pfs1
Step 9: Assess the statistical significance:
A set of random sequences, of same length, same amino-acid and di-peptide
composition as the CELPEP sequences were generated with the program shuffle
(from the HMMER package): shuffle
-d CELPEP > CELRAND
Repeat the search against the shuffled database:
NB: if you want to install these software on your own computer, see the ISREC profile page , the HMMER package, and http://pbil.univ-lyon1.fr/alignment.html . More help.