Bacterial taxa associated with the hematophagous mite, Dermanyssus gallinae detected by 16S rDNA PCR amplification and TTGE fingerprint.
On-line reproduction of the paper by Valiente Moro et al. (2009)*
This web page allows to redo all the computations and graphical displays,
(thanks to the Rweb system).
The full R code (including the 4 code snippets) is available here:
allCode.R.
* Valiente Moro C., Thioulouse J., Chauve C., Normand P., Zenner L. (2009). Bacterial taxa associated with the hematophagous mite, Dermanyssus gallinae detected by 16S rDNA PCR amplification and TTGE fingerprint. Research in Microbiology160, 63-70.
(full-text pdf file here)
1. Importing data in R
Importing data is done by sourcing the
readdata.R file.
The TTGE presence-absence data for the 73 pools and 55 TTGE bands are read from file
poolsPA.txt,
and stored in the tab1 dataframe. TTGE band codes are read from file
CodeBandes.txt.
Farm codes are read from file farms.txt,
and pool numbers are read from file poolNumbers.txt.
The farms factor is build from these two objects.
#
# Read the TTGE presence-absence data file:
#
tab1 <- read.table("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/poolsPA.txt", header=TRUE)
#
# Read the farm and band names and pool numbers files:
#
farmNames <- scan("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/farms.txt", what="character")
poolNumbers <- scan("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/poolNumbers.txt")
codeBandes <- scan("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/CodeBandes.txt")
#
# make the farms factor:
#
farms <- as.factor(rep(farmNames, poolNumbers))
2. Runing PCA, BGA and WGA
PCA, BGA and WGA are done by sourcing the
computations.R file.
The ade4 package is first loaded.
The data table (tab1) contains presence/absence data, so the PCA is done without standardizing the variables
(the parameter "scale" of the dudi.pca
function is set to "FALSE").
BGA and WGA are done with the between
and within functions,
using the farms factor. For the three analyses, the argument scannf is set to false and argument nf is used to set number of axes (four in the PCA and three in the BGA and WGA).
#
# Computations:
#
library(ade4)
#
# Principal Component Analysis
# Between groups Analysis & Within groups Analysis
#
pca1 <- dudi.pca(df = tab1, scale = FALSE, scannf = FALSE, nf = 4)
bga1 <- between(dudi = pca1, fac = farms, scannf = FALSE, nf = 3)
wga1 <- within(dudi = pca1, fac = farms, scannf = FALSE, nf = 3)
3. Testing the statistical significance of the BGA
The null hypothesis of the BGA is that there is no difference between farms.
The test checks that the observed value of the between/total inertia ratio (0.6744)
is much higher than expected under the null hypothesis.
Under the null hypothesis, pools can be permuted randomly without changing
significantly the between/total inertia ratio. So the rows of the dataframe are
permuted at random, and the BGA is computed again. This is performed many
times, to get an idea of the distribution of the between/total inertia ratio.
The figure below is produced with the randtest
function. It shows that the observed value (black diamond, far on the right) is much
higher than all the values obtained after permutation. The p-value is equal to 0.001,
which means that, if there was no difference between farms, the probability of having a
between/total inertia ratio equal to 0.6744 would be less than 1 in 1000.
4. Plotting BGA graphics
The aim of BGA is to separate the groups (here, the farms). We can plot the loadings of the TTGE bands,
which are in the bga1$co dataframe, with the s.label
function (left panel), and to get an idea of the dispersion
of the six mite pools for each farm, we plot the projection of each pool on the factor
map (on the right). The row scores of the pools are in the bga1$ls dataframe, and we superimpose two graphs: the
graph of pool stars (s.class),
and the graph of convex hulls (s.chull)
surrounding the pools belonging to the same farm. We can see that, as the permutation test had just evidenced,
the farms are very different.
5. Plotting WGA graphics
The third code snippet draws the following figure, showing the factor maps of WGA and the dendrogram of the cluster analysis
on TTGE band loadings. The loadings of the 55 TTGE bands are in the wga1$co dataframe, and they are
plotted with the s.label function (top-left panel). The 73 mite pools are grouped by convex hulls, according
to the poultry farm from which they originate (top-right panel). The row scores of the WGA are centered by group,
so the 13 farms are centered on the origin (this corresponds to the fact that the "farm effect" has been removed in
this analysis). The TTGE bands were selected on this figure, using cluster analysis (lower panel), with the complete link algorithm
and euclidean distance (default values of the dist and hclust functions). Note that these distances are computed on WGA variable loadings (wga1$co) and not on raw data.
6. Using the vegan package
The fourth code snippet redoes the same computations using the rda function of the vegan package. The first panel is the rda display, showing the farm effect, and the second panel is the partial rda display (note that the sign of the first axis is inverted compared to the
ade4 outputs). BGA is done by passing the "tab1~farms" formula to the rda function, and WGA uses vegan special function "Condition" to remove ("partial out") the effect of the farms factor.
If you have any problems or comments, please contact
Jean Thioulouse.