Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests
Abby S.S., Tannier E., Gouy M. and Daubin V. 2010. BMC Bioinformatics.
December 2010 -- Prunier 2.0 : new features implemented for the FAST version of Prunier : even more accurate inference of lateral gene transfers !!!
Prunier  is a greedy algorithm to reconcile a gene tree and a species tree under an Subtree Pruning algorithm, which is considered a good model for lateral gene transfer (LGT). Its philosophy is to try to remove one after the other those parts of the tree (subtrees) that are responsible for the highest conflict. Sometimes however, in complex LGT scenarii, it is necessary to remove several independent parts of the tree to eliminate a single conflicting branch. In such cases, the first implementation of Prunier could fail at finding a good LGT scenario, and typically inferred very high numbers of LGT, most of them explaining only statistically unsupported conflict. An example of such a case is shown here.
Two major modifications were introduced in Prunier's algorithm in order to avoid the inferrence of statistically unsupported transfers :
A subtree cannot be pruned if it is not connected to a region of significant conflict. A subtree is connected to a region of significant conflict if there is no common bipartition (i.e. a bipartition shared with the species tree) on the path between this subtree and a significantly conflicting branch in the gene tree.
When no significantly supported conflict can be removed with a single LGT event, Prunier "Fast-FWD" looks one (or n) step(s) in advance to see if, after inferring an LGT, the next LGT in the list will remove significant conflict. The parameter n is called "depth" and can be determined by the user. Note that although Prunier "Fast-FWD" runs slower with increasing "n", if you run it with a depth of "n1" and there exists a better solution with a depth of n2 < n1, Prunier "Fast-FWD" will give you this solution.
Note that the "FWD" function works only in "Fast" mode. However, based on simulations, it appears that the slow mode does not have significant advantage over the fast mode (see ).
- Considering regions of conflict
Prunier aims at detecting lateral gene transfers in a gene tree given a species tree.
- a bifurcating species tree (Newick format) rooted or not,
- a gene alignment (PHYLIP interleaved or FASTA format),
- an optional bifurcating gene tree with support values (Newick format). Note that to be consistent with Prunier, the gene tree has to be estimated with the same parameters than used for branch length estimation of the reference species tree (i.e. WAG+G8+I for proteic alignment, and GTR+G8+I for nucleic alignment).
- Output: a scenario of lateral gene transfers
- per possible rootings of the species tree.
- Prunier builds a gene tree (if not provided by the user) and estimates branch lengths of the species tree using Treefinder .
- Two versions of Prunier's algorithm can be run.
- A “Fast” version that grounds on support values (LR-ELW values provided by Treefinder if the gene tree is not user-defined).
- A “Slow” version that uses the ELW test .
- Prunier is implemented in C++ and uses the Bio++ library .
Prunier drives the Treefinder program to do ML tests (Slow version), to build ML gene trees (if not provided by the user) and to estimate branch lengths of the reference tree given the gene alignment (if needed for computations).
Please install the Treefinder program before running Prunier in any directory of your PATH environment variable so that Prunier can run it.
Then download below the appropriate binary of Prunier, and make it executable.
Run Prunier using the command-line. Prunier's usage is described when typing the name of the binary.
Prunier's mandatory arguments are:
Prunier's optional arguments, and corresponding [default values] are:
- your reference (species) tree in newick format.
- your alignment file in PHYLIP or FASTA format (see parameter aln.type to specify the format).
- molecule type. Possible values: dna or protein.
Examples of Prunier's command-line (example files are available here):
Prunier input.tree.file=alpha_concat_root_reduced.tree aln.file=seqdata1.fas.phy >& prunier_result_simul1
Prunier input.tree.file=alpha_concat_root_reduced.tree aln.file=seqdata110.fas.phy genetree.file=seqdata110.fas.phy_tf.tree method.name=Slow >& prunier_result_simul110
NB: the standard error and outputs have been directed into the file prunier_result_simul1. Please note that Prunier would be soon able to write its results into a file provided by the user.
- a bootstrapped gene tree file. Prunier can deal with gene trees containing only a subset of the species found in the reference tree.
- format of the provided alignment. Default is Phylip interleaved. Other possible value: FASTA.
- version of Prunier to run. Other possible value: Slow.
- support value threshold for topological conflict.
- p-value threshold for significant conflict with the ELW test (Slow version only).
- maximum possible support value in the gene tree file. Default is 100.
- maximal depth at which Prunier looks forward to find a significant HGT when the current HGT is not significant.
- indicates whether Prunier should compute a HGT scenario per possible root. If set to false, the input reference tree must be rooted. Default is true. Other possible value : false.
-  Abby S.S., Tannier E., Gouy M. and Daubin V. Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests. 2010. BMC Bioinformatics.
-  Jobb, G., von Haeseler A., and Strimmer, K.
TREEFINDER: A powerful graphical analysis environment for molecular phylogenetics. 2004. BMC Evolutionary Biology.
-  Strimmer K. and Rambaut A. Inferring confidence sets of possibly misspecified gene trees. 2002. Proc Biol Sci.
-  Dutheil J., Gaillard S., Bazin E., Glémin S., Ranwez V., Galtier N. and Belkhir K. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. 2006. BMC Bioinformatics.
-  Beiko R.G. and Hamilton N. Phylogenetic identification of lateral genetic transfer events. 2006. BMC Evolutionary Biology.
-  Than C. and Nakhleh L. SPR-based Tree Reconciliation: Non-binary Trees and Multiple Solutions. 2008. APBC.
-  Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. 2006. Bioinformatics.
If you have problems or comments...
Back to PBIL home page