TP Bioinformatique


The workshop will mainly use two programs
  • A multiple sequence alignment editor seaview
  • A phylogeny program used for the parsimony and NJ methods : phylo_win
    These programs are accessible on your workstation.


    You will have to save several example data files to your disk by using the "save as..." option of your browser. Preferentially use the .mase extension for sequence alignments.


    A) Insulin phylogeny

    Open the multiple alignment that you have obtained previously (see Exercise 3) with seaview, and save it in MASE format (save as ...). Open the multiple alignment (MASE format) with phylo_win. Compute the molecular phylogeny of vertebrate insulin proteins (poisson distance). Identify gene duplication in the tree. Does the tree correspond to the expected phylogeny (The species/sequence correspondence is available here) ?

    B) Universal phylogeny

    File 28sfrags.mase contains a set of prealigned rRNA sequences from the large (LSU) and the small (SSU) subunits.
    Visualize the set of "reliably" aligned sites called "all sequences".
    Build the universal phylogeny.
    Bootstrap it.
    Try using the "transversion-only" evolutionary distance.
    Is the position of the Euglena chloroplastic sequence expected?


    C) A 250 MY old bacterium : is it possible ?

    Vreeland et al. have published the isolation of a 250 million years-old bacterium from a salt crystal.
    Their data are reproduced in a file of aligned bacterial 16S rRNA sequences permians.mase.
    Compare usage of parsimony and of distances + NJ methods. What is the very important information vehiculed by the branch lengths in the NJ analysis ?
    What do you think of their conclusions ?
    The results of Vreeland et al. have been severely critized by Graur and Pupko, who concluded that the isolated bacterium is most probably recent in age.


    D) The evolutionary origins of HIV-1 and HIV-2 viruses among primate viruses (SIV)

    Gao et al. have published (Nature 397:436) a phylogenetic analysis of the pol gene of HIV-1 and HIV-2 viruses and of their simian homologs (SIV). File hivpol.mase contains public protein sequences with which it is possible to attempt to reproduce their results. Sequence FIV/Oma (Feline Immunodeficiency Virus) is used as an outgroup for the analysis. File hivpol-dna.mase reproduces the same alignment at the DNA level.
    Identify which simian species are at the origin of HIV-1 and HIV-2 viruses ?
    Conduct analyses on Ka and on Ks distances when possible.

    File hivpol.pdf contains the (nearly) complete article by Gao et al. (there is a problem with its first page!).