Prunier 2.1

Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests

Abby S.S., Tannier E., Gouy M. and Daubin V. 2010. BMC Bioinformatics.

Prunier's former version 1.0 (as published in BMC Bioinformatics)

Prunier's former version 2.0 "FAST-FWD" (runs with Treefinder)


April 2013 -- Prunier 2.1 : Several bugs were fixed.

April 2012 -- Prunier 2.1 : Now Prunier uses RAxML instead of Treefinder for Maximum Likelihood estimates. The "Slow" version is not supported anymore.

Installation

Prunier requires the program RAxML [7] to build ML gene trees (if not provided by the user) and to estimate branch lengths of the reference tree given the gene alignment (if needed for computations). RAxML can be downloaded here. Here is the RAxML homepage. Please download and install the RAxML program before running Prunier. Pthreads version can be used when multiple CPUs are available for computation.
Then download below the appropriate binary of Prunier, and make it executable.

Download Prunier

Mac OS X on Intel


64-bit Linux on x86

Documentation

Run Prunier using the command-line. Prunier's usage is described when typing the name of the binary without argument.

Prunier's mandatory arguments are:
input.tree.file
your reference (species) tree in newick format.
aln.file
your alignment file in PHYLIP (interleaved or sequential) or FASTA format (see parameter aln.type to specify the format).
sequence.type
molecule type. Possible values: dna or protein.
raxml.path
the path to your installed RAxML executable.
Prunier's optional arguments, and corresponding [default values] are:
genetree.file[=none]
a bootstrapped gene tree file. Prunier can deal with gene trees containing only a subset of the species found in the reference tree.
aln.type[=PHYLIP_INTER]
format of the provided alignment. Default is Phylip interleaved ("PHYLIP_INTER"). Other possible value: "PHYLIP_SEQ" (sequential PHYLIP format) or "FASTA".
boot.thresh.conflict[=90]
support value threshold for topological conflict.
max.bp[=100]
maximum possible support value in the gene tree file. Default is 100.
fwd.depth[=0]
maximal depth at which Prunier looks forward to find a significant HGT when the current HGT is not significant.
multi_root.bool[=true]
indicates whether Prunier should compute a HGT scenario per possible root. If set to false, the input reference tree must be rooted. Default is "true". Other possible value : "false".


model.type[=LG/GTR]
Phylogeny inference parameter: substitution matrix. Default is LG matrix for amino acids datasets, or GTR for DNA satasets. Matrices available depends on the RAxML version provided by the user.
invariants.cat.bool[=false]
Phylogeny inference parameter: specify wether an invariant sites category is modeled. Default is "false". Other value: "true".
empirical.freq.bool[=true]
Phylogeny inference parameter: specify wether empirical frequencies of residues are used. Default is "true". Other value: "false".


raxml.nb_proc[=1]
Phylogeny inference parameter: specify the number of CPUs to be used by RAxML. This parameter has to bet set to a value above 1 when using the RAxML Pthreads version. Default is 1 (a non Pthreads version of RAxML is used).
raxml.nb_bp[=100]
Phylogeny inference parameter: specify the number of bootstrap replicates performed by RAxML for the gene tree inference (in the case that no gene tree was specified). Default is 100.
raxml.bp_type[=rapid]
Phylogeny inference parameter: specify the kind of bootstrap replicates performed by RAxML for the gene tree inference (in the case that no gene tree was specified). Default is set to "rapid" bootstraps (see RAxML documentation). Other possible value: "real" (for classical bootstrap replicates).


Examples of Prunier2.1 's command-line (example files are available here along with expected output files):

Prunier_2.1 input.tree.file=alpha_concat_root_reduced.tree aln.file=seqdata2.fas genetree.file=seqdata2.tree aln.type=FASTA sequence.type=protein raxml.path=./raxml &> output_prunier_1

Prunier_2.1 input.tree.file=alpha_concat_root_reduced.tree aln.file=seqdata1.fas aln.type=FASTA sequence.type=protein model.type=WAG multi_root.bool=false raxml.path=./raxml_pthreads raxml.nb_proc=12 &> output_prunier_2

Prunier_2.1 input.tree.file=concatenat.tree aln.file=LACTOBACILLALES.16s-aln_profile.fasta-gb.phy aln.type=PHYLIP_INTER sequence.type=dna raxml.path=./raxml_pthreads raxml.nb_proc=12 raxml.nb_bp=1000 &> output_prunier_3

NB: the standard error and outputs has to be directed into a file by the user.

December 2010 -- Prunier 2.0 : new features implemented for the FAST version of Prunier : even more accurate inference of lateral gene transfers !!!

Prunier [1] is a greedy algorithm to reconcile a gene tree and a species tree under an Subtree Pruning algorithm, which is considered a good model for lateral gene transfer (LGT). Its philosophy is to try to remove one after the other those parts of the tree (subtrees) that are responsible for the highest conflict. Sometimes however, in complex LGT scenarii, it is necessary to remove several independent parts of the tree to eliminate a single conflicting branch. In such cases, the first implementation of Prunier could fail at finding a good LGT scenario, and typically inferred very high numbers of LGT, most of them explaining only statistically unsupported conflict. An example of such a case is shown here.

Two major modifications were introduced in Prunier's algorithm in order to avoid the inferrence of statistically unsupported transfers :

A subtree cannot be pruned if it is not connected to a region of significant conflict. A subtree is connected to a region of significant conflict if there is no common bipartition (i.e. a bipartition shared with the species tree) on the path between this subtree and a significantly conflicting branch in the gene tree.

When no significantly supported conflict can be removed with a single LGT event, Prunier "Fast-FWD" looks one (or n) step(s) in advance to see if, after inferring an LGT, the next LGT in the list will remove significant conflict. The parameter n is called "depth" and can be determined by the user. Note that although Prunier "Fast-FWD" runs slower with increasing "n", if you run it with a depth of "n1" and there exists a better solution with a depth of n2 < n1, Prunier "Fast-FWD" will give you this solution.

Note that the "FWD" function works only in "Fast" mode. However, based on simulations, it appears that the slow mode does not have significant advantage over the fast mode (see [1]).


Prunier aims at detecting lateral gene transfers in a gene tree given a species tree.

  • Input:
    • a bifurcating species tree (Newick format) rooted or not,
    • a gene alignment (PHYLIP interleaved or FASTA format),
    • an optional bifurcating gene tree with support values (Newick format). Note that to be consistent with Prunier, the gene tree has to be estimated with the same parameters than used for branch length estimation of the reference species tree (i.e. WAG+G8+I for proteic alignment, and GTR+G8+I for nucleic alignment).
  • Output: a scenario of lateral gene transfers
    • per possible rootings of the species tree.
  • Prunier builds a gene tree (if not provided by the user) and estimates branch lengths of the species tree using Treefinder [2].
  • Two versions of Prunier's algorithm can be run.
    • A “Fast” version that grounds on support values (LR-ELW values provided by Treefinder if the gene tree is not user-defined).
    • A “Slow” version that uses the ELW test [3].
  • Prunier is implemented in C++ and uses the Bio++ library [4].


Installation

Prunier drives the Treefinder program to do ML tests (Slow version), to build ML gene trees (if not provided by the user) and to estimate branch lengths of the reference tree given the gene alignment (if needed for computations).
Please install the Treefinder program before running Prunier in any directory of your PATH environment variable so that Prunier can run it. Then download below the appropriate binary of Prunier, and make it executable.

Download Prunier

MacOS X


64-bit Linux on x86

Documentation

Run Prunier using the command-line. Prunier's usage is described when typing the name of the binary.

Prunier's mandatory arguments are:
input.tree.file
your reference (species) tree in newick format.
aln.file
your alignment file in PHYLIP or FASTA format (see parameter aln.type to specify the format).
sequence.type
molecule type. Possible values: dna or protein.
Prunier's optional arguments, and corresponding [default values] are:
genetree.file[=none]
a bootstrapped gene tree file. Prunier can deal with gene trees containing only a subset of the species found in the reference tree.
aln.type[=PHYLIP]
format of the provided alignment. Default is Phylip interleaved. Other possible value: FASTA.
method.name[=Fast]
version of Prunier to run. Other possible value: Slow.
boot.thresh.conflict[=90]
support value threshold for topological conflict.
test.thresh[=0.05]
p-value threshold for significant conflict with the ELW test (Slow version only).
max.bp[=100]
maximum possible support value in the gene tree file. Default is 100.
fwd.depth[=0]
maximal depth at which Prunier looks forward to find a significant HGT when the current HGT is not significant.
multi_root.bool[=true][
indicates whether Prunier should compute a HGT scenario per possible root. If set to false, the input reference tree must be rooted. Default is true. Other possible value : false.
Examples of Prunier's command-line (example files are available here):

Prunier input.tree.file=alpha_concat_root_reduced.tree aln.file=seqdata1.fas.phy >& prunier_result_simul1

Prunier input.tree.file=alpha_concat_root_reduced.tree aln.file=seqdata110.fas.phy genetree.file=seqdata110.fas.phy_tf.tree method.name=Slow >& prunier_result_simul110

NB: the standard error and outputs have been directed into the file prunier_result_simul1. Please note that Prunier would be soon able to write its results into a file provided by the user.

Bibliography


If you have problems or comments...

Back to PBIL home page