Analysis of various phylogenies
0. Used software
I suggest to use seaview to align, translate, compute and draw phylogenetic trees.
Seaview drives muscle and clustalw2 for alignment and dnapars, protpars, NJ, BioNJ and PhyML for phylogenetic computations.
Seaview may be already available to you. If not, download it from here and unzip/untar the resulting file.
Alternatively, seaview can be used to convert data files to other file formats needed by other software.
Seaview can open files through its "File/Open" menu. On PC and Mac, one can also drop data files on the program icon.
1. A 250 million-year old bacterium ?
Vreeland et al. (2000) published a paper claiming to have isolated a bacterium aged of 250 million years from a salt crystal. This paper is of some importance because such a result was taken to justify the long term safety of salt mines for nuclear waste storage. The 16S rRNA sequence of this bacterium, unknown-2-9-3, as well as the 16S rRNA sequences of a few related extant organisms is available in file permians.nxs.
- Save this as a text file on your computer.
- Open it in seaview, and align its sequences using menu "Align/Align all".
- Save the aligned data set: menu File/Save.
- Look through the alignment moving the horizontal scrollbar
- Compute a parsimony tree: menu "Trees/Parsimony".
- Save the resulting tree: menu "File/Save to trees menu" of the tree window, and then menu "File/Save" of the alignment window.
- Compute a distance tree: menu "Trees/Distance methods" and select the K2P distance. Save the resulting tree as explained above.
- Compare the parsimony (without branch lengths) and distance (with branch lengths) trees.
What is the essential information brought by branch lengths about the unknown-2-9-3 bacterium ?
- What can you conclude about Vreeland et al.'s results ?
You can consult the Graur et Pupko (2001) paper that shows why this bacterium is probably of much more recent origin than 250 MYears.
2. Universal Phylogeny
File 28sfrags.nxs
contains an alignment of parts of the concatenated SSU and LSU rRNAs.
- Save this as a text file on your computer.
- Visualize this alignment with seaview. There are regions of very variable levels of conservation.
- This alignment contains a set of reliably aligned regions. Display it by menu "Sites/All sequences". Trees operations will apply to selected sites only.
- Build the universal phylogeny with PhyML (menu Trees/PhyML) and its default options.
- How can you explain the location of sequence EuglenaCP in this tree ?
3. Evolutionary origin of HIV-1 and HIV-2
Gao et al. (1999) published a phylogenetic analysis of the pol gene in HIV-1 and HIV-2 and in closely related simian viruses.
File hivpol-unal.nxs
contains protein-coding sequences with which it is possible to reproduce their results.
- Save this file on your computer.
- Read it in seaview.
- Align these sequences at the protein sequence level and reproduce the alignment at the DNA level:
- call menu "Props/View as proteins"
- call menu "Align/Align all". Click on button "OK" when alignment program muscle completes its work.
- unselect menu "Props/View as proteins"
- Build a distance tree (menu "Trees/Distance Methods") using Kimura's 2-parameter (K2P) model and the Neighbour-Joining (NJ) tree-building method and 1000 bootstrap replicates.
- What simian species are at the origin of the HIV-1 and HIV-2 human viruses ?
- Compute two other bootstrapped NJ trees using the Ka non-synonymous and the Ks synonymous distances.
- If necessary, use sequence FIV/Oma (Feline Immunodeficiency Virus) as outgroup: in the tree window, click on "Re-root", then click on the black square next to FIV/Oma, then click on "Full".
- Compare synonymous, non-synonymous, and K2P branch lengths. Which is largest ? smallest ? Why ?
4. Bacterial phylogeny using the nifH protein
- Download file nifH.fasta and save it on your computer.
- Compute a distance tree using the Poisson distance and the BioNJ tree-building algorithm and bootstrap it with 500 replicates.
- How would you root this tree ? Use this table to get information about species in the data set.
- Is there evidence for gene duplications during the early evolutionary history of this gene in the bacterial and archaeal domains ?
- Is there evidence for horizontal transfers of nifH genes ? You can use menu Edit/Find to locate species in the tree.
5. Use of the maximum likelihood phylogenetic method PhyML
- Dowload file c8alphapre.nxs containing several sequences of the precursor of complement component c8 alpha chain.
- Align these protein sequences.
- Compute a first PhyML tree (Menu Trees/PhyML) using the WAG amino acid similarity model and without accounting for across-site evolutionary rate variation: select None in section "Across site rate variation". Memorize the tree (menu File/Save to Trees menu).
- Compute a second PhyML tree accounting for across-site evolutionary rate variation: select Optimize in section "Across site rate variation".
- Compare likelihoods and branch lengths of the two trees. Which one is bigger ? Why ?
6. PhyML analysis with approximate likelihood ratio test
- Run on the permians dataset a first phyml tree with menu "Trees/PhyML" and the default options. When the computation is completed hit the OK button. Save the resulting tree. Close the tree window.
- Run a second PhyML tree selecting Bootstrap with 100 replicates in the PhyML options window.
- After a while, you'll see that this computation is going to be rather long, so you may interrupt it.
- Go back to the previous PhyML tree selecting its name the Trees menu. The Bootstrap checkbox of the tree menu shows the barnch support values computed by PhyML. These are "approximate likelihood ratio test" values that are between 0 and 1. Branches with values close to 1 have a strong statistical support from the sequence data set.
7. Use of PhyML on highly divergent ribosomal RNA sequences
-
Download file LSU.phylip containing aligned conserved regions of eukaryotic and archaeal LSU rRNA sequences .
- Compute a first PhyML tree using the GTR nucleotide substitution model and without accounting for across-site evolutionary rate variation. Memorize the tree: menu File/Save to Trees menu and then File/Save as to nexus format.
- Compute a second PhyML tree accounting for across-site evolutionary rate variation and for invariable sites.
Compare likelihoods and branch lengths of the two trees, particularly for the branch leading to the microsporidian Encephalitozoon cuniculi. Use of a more realistic model (likelihood increases strongly) reveals a large difference in LSU rRNA evolutionary rate variation between microsporidia and other eukaryotes that was concealed in the first analysis.
One should here suspect that microsporidia are badly positinned in this analysis because of long branch attraction by the archaeal outgroup.
The position of microsporidia deduced from several genes that did not evolve fast in the microsporidian lineage is available here.