Next: Introduction, Previous: (dir), Up: (dir) [Contents]
This is the manual of the TestNH package, version 3.0.0.
Copyright © 2017 Bastien Boussau, Julien Dutheil and Laurent Guéguen
• Introduction: | ||
• mapnh: | ||
— The Detailed Node Listing — Introduction | ||
---|---|---|
• Description: | ||
• Run: | ||
TestNH is a package for studying the non-homogeneous process of sequence evolution. It is written on the Bio++ libraries, and uses the command line syntax common to the Bio++ Program Suite (BppSuite). Part of this manual will therefore link to the corresponding manual of BppSuite where needed, and only describe options specific to the TestNH package.
Note that several detailed examples are provided along with the source code of the program, and can serve as good training starts. This manual intends to provide an exhaustive description of the options used in these examples.
• Description: | ||
• Run: |
Next: Run, Previous: Introduction, Up: Introduction [Contents]
The TestNH package contains one program:
maps substitutions onto a phylogenetic tree, and counts types of substitutions per branch of the tree. The resulting counts are used as input of a clustering procedure to output groups of branches with similar substitution processes.
Next: mapnh, Previous: Description, Up: Introduction [Contents]
All programs in the TestNH package follow the ‘bppSuite’ syntax.
They are command line driven, and take as input options with the form
‘name’=‘value’. These options can be gathered into a file,
and loaded using param=optionfile
. Please refer to the
Bio++
Program Suite manual for more details, including the use of
variables, priority of option values, etc.
Previous: Introduction, Up: Top [Contents]
MapNH takes as input a sequence alignment, as described in the bppSuite manual (Sequences in Bio++ Program Suite manual). It then performs substitution mapping to count substitutions for each site of the alignment and each branch of a phylogenetic tree, input using the bppSuite syntax (Tree in Bio++ Program Suite manual).
The substitution mapping procedure requires a model of sequence evolution. As the procedure is robust to the type of model used, a Jukes-Cantor model is used by default. It is recommended however to use a less coarse model whenever possible (particularly for large alphabets like codon alphabets). All non-mixed models available in bppSuite are supported (Model in Bio++ Program Suite manual). A homogeneous model (like GTR for nucleotide, JTT92 for proteins and YN98 for codons) is usually a good start. Non homogeneous models are also supported, mainly for a posteriori validation of mapping robustness (to be used with PartNH for instance).
MapNH can perform several types of substitution mapping, which determine which type of substitution have to be counted and used for clustering branches. This is specified with command:
map.type = {register described}
A description of the register to use.
The types of substitutions to map are:
Maps all n(n-1) possible substitutions. This option should be only used for small alphabet sizes like DNA or RNA, as it uses a large amount of memory and dilutes the information.
Counts the total number of substitutions.
Maps substitutions as defined in a list. This list is built as:
substitution.list = (Ts:A->G, G->A, C->T, T->C) (Tv: A->C, A->T, T->A, C->A, G->C, G->T, C->G, T->G) |
The same group of substitutions is delimited by parentheses. The name, if entered, is entered at the start of a string and followed by ":". Substitutions are delimited by ",", and each substitution is defined with a "->" symbol.
Maps two types of substitutions: ‘AT to GC’ and ‘GC to AT’.
With codon alphabet, only synonymous substitutions are considered
(otherwise see also SW option). This option takes as input an
optional argument telling if the counts should be corrected for
nonstationarity: GC(stationarity=no)
(yes by default) will
normalize the counts by the ancestral frequencies of the corresponding
node.
Counts transitions (type 1) and transversions (type 2).
Counts substitutions between or within GC vs AT Watson-Crick bounds, ie whether the bound is strong (GC pair) or weak (AT pair). The type numbers are 1 : S->S, 2 : S->W, 3: W->S, 4: W->W.
Counts nonsynonymous (type 2) and synonymous substitutions (type 1).
Intra amino-acid substitutions (type following the AA alphabetic order).
Inter amino-acid substitutions (in both directions).
Counts conservative (type 1) or non-conservative substitutions (type2).
Counts combinations of substitution types.
output.counts={output type}
Describes the type of outputs. There are several types:
The corresponding options are:
With the prefix name for all counts tree files. Tree file for counts of type 1 will be named ‘prefix2’, for type 2 ‘prefix2’ and so on.
The file path indicates where the table should be stored.
The file path indicates where the table should be stored.
With the prefix name for all table files. Table file for counts of type 1 will be named ‘prefix2’, for type 2 ‘prefix2’ and so on.
The distinct outputs can be combined as a list, for instance:
output.counts=PerType(prefix=mapping_per_type),\ PerBranchPerSite(file=mapping_per_site.txt) |
Counts can be normalized with the counts that could have been performed by another model, on the same history as the one described by the main model. For this, use option:
modelNullParams = {list{<chars>=<values>}}
to assigne a list of parameter values used to define the normalization
model from the main one.
The ’*’ wildcard can be used, as in *theta*
for all the
parameters whose name has theta
in it.
For example, to normalize by the counts performed by a neutral model, in YN98 modeling (typically for dN/dS):
nullModelParams = YN98.omega*=1 |
In the case where we want separate counts (aka raw counts &
normalizations), use the splitNorm=True
option in the
ouput.counts
options.
output.counts = PerBranchPerType(prefix=$(REP)/$(DATA).counts_,\ splitNorm=True) |
In this case, an additional file with suffix _norm is output for normalizations, while regular output contains raw counts.
Based on this counts, MapNH can make a global test to assess if there is heterogeneity between branches:
test.global = {boolean}
Tell if global tests should be performed. If yes, two test will be done: a chi square contingency table, and a multinomial test. Note that both tests are indicative only, as the assumptions mode for computing the p-values may be incorrect.
manageUnresolved= {Zero|One|Average}
describes how unresolved characters are managed in counts: