About

Version : 1.0
Date : 16 June 2004
By : Timothée SILVESTRE and Estelle NUGUES at the Biometry and Evolution Laboratory
Claude Bernard University-Lyon1
Pôle Bioinformatique Lyonnais(PBIL)
Projet HELIX - INRIA
43 bd du 11 Novembre 1918
69622 Villeurbanne cedex - FRANCE
Web : http://pbil.univ-lyon1.fr
Contacts : {silvestr,duret,mgouy}@biomserv.univ-lyon1.fr

General

Phylojava is a software (written in java) dedicated to phylogenetic tree reconstruction. This program allows phylogenetic tree inferences according to most usual methods (distance methods, maximum parsimony, maximum likelihood).
Phylojava is a client/server software: phylogenetic trees are computed on a remote server, and are sent via internet to a graphical interface (the client) that allows the user to handle alignments and phylogenetic trees.
The user therefore only has to install the graphical interface on his computer, and can submit tree reconstruction jobs on a remote server.
Phylojava works with DNA or Protein alignment files in MASE format. Taxonomic species groups and sets of conserved regions can be defined by clicking in the alignment and stored into the sequence file, thus avoiding multiple data files. It is possible to modify trees (select the root, swap nodes, add comments, etc.) and to save them within the sequence file or as separate files.
After having selected a method and a set of sites and species, jobs are sent to the server. Short jobs are automatically processed and the result is immediatly displayed by the client. Longer jobs are put in a batch queue. The user is informed by e-mail when his job is finished. It is also possible to check at any time the status of a job and to kill it if necessary. Once a job is finished, results can be retrieved through the phylojava client graphical through the phylojava client graphical interface. Results are stored on the server for one month.
A set of alignment files are provided in various formats in the phylojava directory as example_files

Methods

Distances
3 algorithm has been choosen :
- BIONJ (Gascuel O., 1997, BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data, Molecular Biology and Evolution 14(7):685-695)
- Neighbor ( Phylip package v3.6)
- Fastme ( Desper, R., Gascuel, O. (2002). Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle, Journal of Computational Biology 19(5), pp. 687-705.)
Protdist and dnadist (Phylip package v3.6) compute the distances matrix, and gamma rate are infered with the puzzle algorithm (see below)

Parsimony
Dnapars and Protpars are used for parsimony calculations (Phylip v3.6)

Maximun likelihood
- Phyml. This software implements a new method for building phylogenies from DNA and protein sequences using maximum likelihood. Data sets can be analyzed under several models of evolution (JC69, K2P, F81, F84, HKY, TN93 and GTR for nucleotides and Dayhoff, JTT and mtREV for amino acids). A discrete-gamma model (Yang, 1994) is implemented to accommodate rate variation among sites. Invariable sites can also be taken into account. PHYML has been compared to several other softwares using extensive simulations. The results indicate that its topological accuracy is at least as high as that of fastDNAml, while being much faster.

- Proml (phylip v3.6). This program implements the maximum likelihood method for protein amino acid sequences. It uses the Dayhoff probability model of change between amino acids. The assumptions of the present model are: 1.Each position in the sequence evolves independently. 2.Different lineages evolve independently. 3.Each position undergoes substitution at an expected rate which is chosen from a series of rates (each with a probability of occurrence) which we specify. 4.All relevant positions are included in the sequence, not just those that have changed or those that are phylogenetically informative 5.The probabilities of change between amino acids are given by the model of Dayhoff (Dayhoff and Eck, 1968 Dayhoff et. al., 1979).

- Protml (Phylip package v3.6) Maximum Likelihood Inference of Protein Phylogeny Copyright (C) 1992-1996 J. Adachi & M. Hasegawa.

- Puzzle (tree-puzzle-5.0). TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREE-PUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can be calculated under the clock-assumption. In addition, TREE-PUZZLE offers a novel method, likelihood mapping, to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number of statistical tests on the data set (chi-square test for homogeneity of base composition, likelihood ratio clock test, Kishino-Hasegawa test). The models of substitution provided by TREE-PUZZLE are TN, HKY, F84, SH for nucleotides, Dayhoff, JTT, mtREV24, VT, WAG, BLOSUM 62 for amino acids, and F81 for two-state data. Rate heterogeneity is modeled by a discrete Gamma distribution and by allowing invariable sites. The corresponding parameters can be inferred from the data set.

Tree Bootstrap calculations


Alignment are bootsraped with the seqboot program (Phylip).The selected method is applied to the generated alignments and a consensus tree is generated with the consense program. Bootstrap and distances values are then calculated and added to the final tree.

ATV use

A Tree Viewer code has been adapted and integrated into the phylojava application. We thank Christian Zmasek for allowing us to use ATV (Email: zmasek@genetics.wustl.edu, WWW: http://www.genetics.wustl.edu/eddy/people/zmasek/).
Reference: Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384.

General Use

* Supported alignment formats are : MASE, NEXUS, PHYLIP, MSF,CLUSTAL and FASTA

* Input sequences must aligned and gap specified by a '-' in order to get the same size

* Phylojava client do not handle gap pairwise as phylo_win.

* Sequences and sites are deselected with mouse left click. Set of sites or groups of sequences could be saved and retrieved when re-opening the file. A right click on the alignment displays the position of the mouse.

* Warning messages are displayed when ambiguous character are detected in the sequences. These characters may cause problems for some methods

* You can visualize sequences as DNA or PROTEIN.

* Methods are specific of the type of sequence and are available as soon as a file is opened. Once a alignment is displayed, one can select a method by cliking on the \"methods\" menu.

* According to the choice of the method, the size of the sequence and various parameters, a job could be immediately processed and the tree will be present in the \"edit tree\" panel, or it could be stored on server. In this case, the user is informed by e-mail when his job is finished. It is also possible to check at any time the status of a job and to kill it if necessary with the \"results\" menu.

* Newick format trees could be imported in the tree panel and displayed with ATV(A Tree Viewer). ATV allows to display branch length and bootstrap values. Sequences names are editable, tree could be re-rooted and branch swaped. Any changes could be saved as a single tree file or inside the alignment file. The \"reload\" button allows to undo modifications and retrieve the original tree.

* The preferences menu let the user setting his own mail or his default server. This server will be automatically connected at the opening of phylojava (PBIL server is selected by default).

* The server menu allows to select a new server by entering its address. The list of available public server could be obtained and their address simply need to be copied.
* A search of available methods could be done within the \"search\" menu. You can type the name of a method you are searching and the addresses of servers that provide this method will be displayed.

* A forum has been set to allow bug report and comments concerning the application.

* The mail menu permits to send a mail to the administrator of a phylojava server, to report a problem or suggestions concerning a specific server.

Troubleshoots & Remarks

* The name of your sequence should no begin by a number

* On MacOsX and Windows sytems, the tree window need to be resized in order to display the phylogeny properly

* On MacOsX, the print option is not working due to a different java implementation compared to other systems

* Accents and accentuations are forbidden in the names and path of alignments files