Logo_PhyML-Multi


Bastien Boussau

bastien.boussau@univ-lyon1.fr

Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558



INTRODUCTION
DOWNLOAD
INSTALLATION
USAGE

OUTPUT
USING THE PYTHON SCRIPTS

CITATION

REFERENCES

INTRODUCTION


PhyML_Multi is a program built to compute phylogenetic trees in cases where events of homologous recombination have affected an alignment. It can simultaneously reconstruct phylogenetic trees and find recombination breakpoints. To use it, first phyml_multi must be run on an alignment, and then its results must be analysed through the python scripts provided in the package.

It has been built upon the algorithmic structure of PhyML (Guindon and Gascuel, 2003) (many thanks to Stéphane Guindon for providing PhyML source code).
It is provided as is and should be used with appropriate care.

Please report any bug to my address : bastien.boussau@univ-lyon1.fr.
It has been tested on Unix and Linux systems.
For more information about my work,  see  my lab page.



DOWNLOAD

You can download an archive containing an executable file for Linux, all the source code and the python scripts. Example files can also be downloaded here.



INSTALLATION


Installation on Linux:
you first need to extract it: tar zxvf phyml_multi.tgz
Then, simply go to the directory named "phyml_multi", make sure the file is executable (otherwise type chmod +x phyml_multi) and type:
./phyml_multi

To install it, you first need to extract it:
tar zxvf phyml_multi.tgz
Then, simply go to the directory named "phyml_multi" and type :
make
There should be an executable entitled "phyml_multi".

Installation on Mac OS X (thanks to Cedric Simillion)
  1. cd into the phyml_multi directory after extracting the archive
  2. remove the phyml_multi binary and all .o files that came with the archive
  3. open the Makefile in a text editor and remove the -static option from the CFLAGS line
  4. typing "make" now produces a working binary for OS X.



USAGE

phyml_multi needs a sequence file in phylip format.

  • Phylip-like interface

Go to its installation directory and type :

./phyml_multi

Then you face a phylip-like (and PhyML-like) interface which asks for self-explanatory information such as the number of rate categories for the gamma law, whether or not the transition/transversion rate should be optimized...

  • Command line

You can also use phyml_multi directly from the command line using :

./phyml_multi seqs1 0 i 2 0 HKY 4.0 e 1 1.0 BIONJ y n n 2

Where :
seqs1 : sequence file in phylip format,

0 | 1 : put 0 if working with nucleotide sequences, 1 for amino-acid sequences

i : helps specifying phylip interleaved format (can also be s for phylip sequential format),

2 : number of datasets to analyse (cannot be below 1!)

0 : number of bootstrap sets to generate

HKY : name of the model to be used (JC69 | K2P | F81 | HKY | F84 | TN93 for nucleotide sequences, JTT | MtREV | Dayhoff | WAG for amino-acid sequences)

4.0 : transition/transversion ration; putting "e" lets the program evaluate this ratio

e : proportion of invariable sites; putting "e" lets the program evaluate this proportion

1 : number of substitution rate categories

1.0 : shape parameter of the gamma distribution; putting "e" lets the program evaluate this parameter

BIONJ : technique used to build starting trees, or alternatively, file containing starting trees

y | n : should we optimize the tree topology

y | n : should we optimize branch lengths

y | n : should we use a Hidden Markov Model instead of the Mixture Model on trees

2 : Number of trees expected in the alignment



When there is only one rate of evolution, no alpha is used.
 




OUTPUT

Several files are produced in the directory containing the "SequenceFile". The most important ones are as follow :

- SequenceFile_phyml.lk possesses 2 or 3 lines : the first one displays the final likelihood of the output trees, the second one (if the HMM option has been chosen) gives the value of the autocorrelation parameter, and the last line gives the time used for the computation.

- SequenceFile_phyml_siteLks.txt contains a table showing site likelihoods and log-likelihoods for each tree. This file is then used as input for the python scripts that produce the segmentation.

- SequenceFile_phyml_tree.txtX contains the tree number X






USING THE PYTHON SCRIPTS


To be used, the Python scripts require that the SARMENT libraries are installed on the system. Information regarding their installation can be found here.
The Python script to use depends upon whether the Hidden Markov Model was used in the phyml_multi analysis or not.
  • Hidden Markov Model analysis
  • If the Hidden Markov Model has been used in the phyml_multi analysis, the script "PartitioningHMM.py" needs to be used. To use it, enter:
    python PartitioningHMM.py SequenceFile_phyml_siteLks.txt autocorrelation_parameter
    where "SequenceFile_phyml_siteLks.txt" is an output file from phyml_multi, and autocorrelation_parameter is the value of the autocorrelation parameter as found on the second line of the file "SequenceFile_phyml.lk" (see above).
    The main output files are: SequenceFile_vit.ps : drawing showing the segmentation found by the Viterbi algorithm. The precise number and positions of the breakpoints can be found in the file SequenceFile_vit.part.
    Similarly, SequenceFile_fb.ps : drawing showing the segmentation found by the forward-backward algorithm. The precise number and positions of the breakpoints can be found in the file SequenceFile_fb.part.
    Results obtained using the Viterbi algorithm or the forward-backward algorithm are expected to be very similar.
  • Mixture Model analysis
  • If instead the Mixture Model has been used, the script "PartitioningMM.py" (no "H"!) should be used, as follows:
    python PartitioningMM.py SequenceFile_phyml_siteLks.txt
    where "SequenceFile_phyml_siteLks.txt" is an output file from phyml_multi.
    SequenceFile_Partitioned.ps : drawing showing the segmentations found by the MPP algorithm. The precise number and positions of the breakpoints for the best partitioning can be found in the file SequenceFile_PartitionBestNumber.



CITATION


An article has been published in Evolutionary Bioinformatics with the following reference:
Boussau B, Guéguen L, Gouy M (2009). "A Mixture Model and a Hidden Markov Model to Simultaneously Detect Recombination Breakpoints and Reconstruct Phylogenies", Evolutionary Bioinformatics, 2009, Jun 25;5:67-79. See here, or there.



REFERENCES

Guindon S, Gascuel O (2003). "A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood", Syst Biol. 2003 Oct;52(5):696-704.

Guéguen L (2005). "Sarment: Python modules for HMM analysis and partitioning of sequences", Bioinformatics. 2005 Aug 15;21(16):3427-3428.

Guéguen, L (2001). "Segmentation by maximal predictive partitioning according to composition biases", Chap. Segmentation by maximal predictive partitioning according to composition biases., pages 32-45 of: O, Gascuel, et M, Sagot (eds), Computational Biology. LNCS, vol. 2066. Springer-Verlag.