PhyML_Multi - Phylogenetic tree reconstruction for alignments that have undergone recombination

PhyML_Multi is a program built to compute phylogenetic trees in cases where events of homologous recombination have affected an alignment. It can simultaneously reconstruct phylogenetic trees and find recombination breakpoints. To use it, first phyml_multi must be run on an alignment, and then its results must be analysed through the python scripts provided in the package.

It has been built upon the algorithmic structure of PhyML (Guindon and Gascuel, 2003) (many thanks to Stéphane Guindon for providing PhyML source code).
It is provided as is and should be used with appropriate care.
Please report any bug to my address : bastien.boussau@univ-lyon1.fr.
It has been tested on Unix and Linux systems.
For more information about my work, see my lab page.

DOWNLOAD

You can download an archive containing an executable file for Linux, all the source code and the python scripts. Example files can also be downloaded here.

INSTALLATION

Installation on Linux:

If you work under Linux and do not want to compile the program, there is an executable file in the package:

you first need to extract it: tar zxvf phyml_multi.tgz
Then, simply go to the directory named "phyml_multi", make sure the file is executable (otherwise type chmod +x phyml_multi) and type:

./phyml_multi

If you want to build the executable file from the source code :

To install it, you first need to extract it:

tar zxvf phyml_multi.tgz

Then, simply go to the directory named "phyml_multi" and type :

make

There should be an executable entitled "phyml_multi".

Installation on Mac OS X (thanks to Cedric Simillion)

cd into the phyml_multi directory after extracting the archive
remove the phyml_multi binary and all .o files that came with the archive
open the Makefile in a text editor and remove the -static option from the CFLAGS line
typing "make" now produces a working binary for OS X.

USAGE

phyml_multi needs a sequence file in phylip format.

Phylip-like interface

Go to its installation directory and type :

./phyml_multi

Then you face a phylip-like (and PhyML-like) interface which asks for self-explanatory information such as the number of rate categories for the gamma law, whether or not the transition/transversion rate should be optimized...

Command line

You can also use phyml_multi directly from the command line using :

./phyml_multi seqs1 0 i 2 0 HKY 4.0 e 1 1.0 BIONJ y n n 2

Where :

seqs1 : sequence file in phylip format,

0 | 1 : put 0 if working with nucleotide sequences, 1 for amino-acid sequences

i : helps specifying phylip interleaved format (can also be s for phylip sequential format),

2 : number of datasets to analyse (cannot be below 1!)

0 : number of bootstrap sets to generate

HKY : name of the model to be used (JC69 | K2P | F81 | HKY | F84 | TN93 for nucleotide sequences, JTT | MtREV | Dayhoff | WAG for amino-acid sequences)

4.0 : transition/transversion ration; putting "e" lets the program evaluate this ratio

e : proportion of invariable sites; putting "e" lets the program evaluate this proportion

1 : number of substitution rate categories

1.0 : shape parameter of the gamma distribution; putting "e" lets the program evaluate this parameter

BIONJ : technique used to build starting trees, or alternatively, file containing starting trees

y | n : should we optimize the tree topology

y | n : should we optimize branch lengths

y | n : should we use a Hidden Markov Model instead of the Mixture Model on trees

2 : Number of trees expected in the alignment

When there is only one rate of evolution, no alpha is used.

OUTPUT

Several files are produced in the directory containing the "SequenceFile". The most important ones are as follow :

- SequenceFile_phyml.lk possesses 2 or 3 lines : the first one displays the final likelihood of the output trees, the second one (if the HMM option has been chosen) gives the value of the autocorrelation parameter, and the last line gives the time used for the computation.

- SequenceFile_phyml_siteLks.txt contains a table showing site likelihoods and log-likelihoods for each tree. This file is then used as input for the python scripts that produce the segmentation.

- SequenceFile_phyml_tree.txtX contains the tree number X

USING THE PYTHON SCRIPTS

To be used, the Python scripts require that the SARMENT libraries are installed on the system. Information regarding their installation can be found here.
The Python script to use depends upon whether the Hidden Markov Model was used in the phyml_multi analysis or not.

Hidden Markov Model analysis

above

Mixture Model analysis

CITATION

An article has been published in Evolutionary Bioinformatics with the following reference:

Boussau B, Guéguen L, Gouy M (2009). "A Mixture Model and a Hidden Markov Model to Simultaneously Detect Recombination Breakpoints and Reconstruct Phylogenies", Evolutionary Bioinformatics, 2009, Jun 25;5:67-79. See here, or there.

REFERENCES

Guindon S, Gascuel O (2003). "A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood", Syst Biol. 2003 Oct;52(5):696-704.

Guéguen L (2005). "Sarment: Python modules for HMM analysis and partitioning of sequences", Bioinformatics. 2005 Aug 15;21(16):3427-3428.

Guéguen, L (2001). "Segmentation by maximal predictive partitioning according to composition biases", Chap. Segmentation by maximal predictive partitioning according to composition biases., pages 32-45 of: O, Gascuel, et M, Sagot (eds), Computational Biology. LNCS, vol. 2066. Springer-Verlag.

Bastien Boussau, PhD
bastien.boussau@univ-lyon1.fr
Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558