Tree data storage and manipulation
- The Bio++ Phylogenetics Library (bpp-phyl) provides classes and methods for phylogenetics and molecular evolution. The bpp::Tree interface provides general methods to store and manipulate phylogenetic trees. Several utilitary methods can also be found in the bpp::TreeTools static class. The only implementation for now of the bpp::Tree interface is the bpp::TreeTemplate class. It uses a recursive storage of bpp::Node objects (or any class inheriting from it). The bpp::Node object contains methods to access the father and son nodes in the hierarchy, and several fields like a name, an id or the length of the branch connected to its father. It also includes support for node and branch properties (like bootstrap values) that can be attached to and manipulated together with the tree. The bpp::NodeTemplate class can be used in order to extend the tree structure and add more complex data to the tree. The corresponding bpp::TreeTemplateTools provide specific methods, in most cases more efficient than their equivalent in the bpp::TreeTools.
- Trees can also be read and written from/to files, using the bpp::Newick class.
Phylogenetic reconstruction methods
- PhylLib provides tools to reconstruct phylogenies from sequence data, using maximum parsimony, distance-based methods and maximum likelihood, all of them implemented in an object-oriented way, and hence involving several classes.
- Maximum parcimony
- See bpp::TreeParsimonyScore for parsimony score computation. Only a Nearest Neighbor Interchange (NNI) algorithm for topology estimation is provided for now, see bpp::NNISearchable, bpp::NNITopologySearch and bpp::OptimizationTools for more user-friendly methods.
- Distance methods
- The bpp::DistanceEstimation class allows you to compute pairwise distances from a large set of models (see next section), and store them as a bpp::DistanceMatrix. This matrix is the input of any distance-based method. The (U/W)PGMA (bpp::PGMA), neighbor-joining (bpp::NeighborJoining) and BioNJ (bpp::BioNJ) methods are implemented.
- Maximum likelihood methods
- Use a model to describe the evolutionary process, among many available (see next section). Support for homogeneous (reversible or not) and non-homogeneous models is provided. Several likelihood computation algorithms are provided, depending on the final usage. All classes are instances of the bpp::TreeLikelihood interface. The bpp::DiscreteRatesAcrossSitesTreeLikelihood interface adds support for rate heterogeneity across sites.
- The bpp::TreeLikelihood class inherits from the bpp::Function interface, which means that any optimization method from the NumCalc library can be used to estimate numerical parameters. The bpp::OptimizationTools static class provides general methods with predefined options, including for topology estimation.
Evolutionary models
- The Bio++ phylogenetic library provides different kinds of models. Substitution models are provided via the bpp::SubstitutionModel interface. All commonly used models for nucleotides and proteins are provided (see for instance bpp::JCnuc, bpp::K80, bpp::GTR, bpp::JTT92, etc.). You can add your own model by implementing the bpp::SubstitutionModel interface. Rate across sites (RAS) models are integrated thanks to the bpp::DiscreteDistribution interface, providing support for the gamma (bpp::GammaDiscreteDistribution) and gamma+invariant (bpp::InvariantMixedDiscreteDistribution) rate distributions. Here again, this is very easy to add support for new rate distributions by implementing the corresponding interface.
- Markov-modulated Markov models (of which the covarion model is a particular case) are included via the bpp::MarkovModulatedSubstitutionModel interface, and its implementation bpp::G2001 and bpp::TS98.
- Finally from version 1.5, it is possible to build virtually any kind of non-homogeneous model thanks to the bpp::SubstitutionModelSet class.
And more...
- PhylLib allows you to perform a lot of analysis, like evolutionary rate estimation, tree consensus, etc.