# Simulations computed by Sophie Abby (aug 2009) with a modified version of the program by Nicolas Galtier described in # "A model of horizontal gene transfer and the bacterial phylogeny problem." Syst. Biol. 2007 This directory contains : - the rooted reference species tree, inspired by a biological dataset of 40 alphaproteobacterial species "alpha_concat_root_reduced.tree". - the 330 simulated trees with increasing number of SPR moves (from 0 to 10). There is approximately 30 simulations per number of transfers. Their names are "tree#". - the 330 simulated alignments by making them evolve along the corresponding tree. These alignments are provided in phylip format. They are called "seqdata#.fas.phy", "#" being the number of the correponding tree. - 330 maximum-likelihood trees reconstructed from the alignments "seqdata#.fas.phy_tree". The model of substitution used to build that trees with treefinder was WAG(+G+I), whereas the model used to generate the sequences was JTT, in order to increase the level of difficulty of the ML-trees dataset. - SPR moves are described in the "moves" file. Information concerning the recipients and the donors of simulated LGT are respectively called "Mover" and "Dest". The nodal distance (the number of crossed nodes) of the SPR moves were recorded in the field "Nodal distance". As some moves do not result in a topological change (transfers between sister taxa for example), some of the SPR were counted in "Nb of moves", but not in "Nb of HGT". The clades of movers that are potentially misplaced in the final simulated gene trees are in the field "HGT : ". - A summary of the SPR (subtree prune and regrafts) moves used for the simulation of the gene trees are contained in the "spr_moves" file. For each artificial "tree #", moves that resulted in a topological change are described on one line per simulated transfer: "mover => dest", "mover" being the group that moved by SPR, recipient of the transfer, and "dest" being the new sister group of the mover, donor of the transfer. If no move is described, no HGT event was introduced in this tree. - Parameters used for the simulations: (for more details see N. Galtier in Syst. Biol. 2007) diameter 2. 3. /* min and max gene tree diameter */ seq_type PROT /* sequence type (DNA or PROT) */ seq_length 100 400 /* min and max length (uniform distribution) */ total_rho 0 /* average number of global rate changes */ total_tau 10 /* average number of HGT's per gene tree */ total_rho_prime 100 /* average number of rate changes per gene tree */ alpha_l 5 /* Gamma distribution across lineages: shape */ alpha_s 0.5 /* Gamma distribution across sites: shape */ subst_model JTT92 /* available options: DNA: JC, K80, T92, HKY85, TN93, GTR PROT: JCprot, DSO78, JTT92 */ subst_rates 4 0.6 /* JC, JCprot, DSO78, JTT92: not required K80: kappa (ts/tv ratio) T92: kappa theta (=stationary GC-content) HKY85: kappa A C G T TN93: kappa1 kappa2 A C G T GTR: a b c d e A C G T (see BIO++ documentation) */