Bacterial taxa associated with the hematophagous mite, Dermanyssus gallinae detected by 16S rDNA PCR amplification and TTGE fingerprint.

On-line reproduction of the paper by Valiente Moro et al. (2009)*
This web page allows to redo all the computations and graphical displays, (thanks to the Rweb system).
The full R code (including the 4 code snippets) is available here: allCode.R.

* Valiente Moro C., Thioulouse J., Chauve C., Normand P., Zenner L. (2009). Bacterial taxa associated with the hematophagous mite, Dermanyssus gallinae detected by 16S rDNA PCR amplification and TTGE fingerprint. Research in Microbiology 160, 63-70. (full-text pdf file here)

1. Importing data in R

Importing data is done by sourcing the readdata.R file. The TTGE presence-absence data for the 73 pools and 55 TTGE bands are read from file poolsPA.txt, and stored in the tab1 dataframe. TTGE band codes are read from file CodeBandes.txt. Farm codes are read from file farms.txt, and pool numbers are read from file poolNumbers.txt. The farms factor is build from these two objects.

#
# Read the TTGE presence-absence data file:
#
tab1 <- read.table("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/poolsPA.txt", header=TRUE)
#
# Read the farm and band names and pool numbers files:
#
farmNames <- scan("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/farms.txt", what="character")
poolNumbers <- scan("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/poolNumbers.txt")
codeBandes <- scan("http://pbil.univ-lyon1.fr/members/thioulouse/TTGE/CodeBandes.txt") #
# make the farms factor:
#
farms <- as.factor(rep(farmNames, poolNumbers))

2. Runing PCA, BGA and WGA

PCA, BGA and WGA are done by sourcing the computations.R file. The ade4 package is first loaded. The data table (tab1) contains presence/absence data, so the PCA is done without standardizing the variables (the parameter "scale" of the dudi.pca function is set to "FALSE"). BGA and WGA are done with the between and within functions, using the farms factor. For the three analyses, the argument scannf is set to false and argument nf is used to set number of axes (four in the PCA and three in the BGA and WGA).

#
# Computations:
#
library(ade4)
#
# Principal Component Analysis
# Between groups Analysis & Within groups Analysis
#
pca1 <- dudi.pca(df = tab1, scale = FALSE, scannf = FALSE, nf = 4)
bga1 <- between(dudi = pca1, fac = farms, scannf = FALSE, nf = 3)
wga1 <- within(dudi = pca1, fac = farms, scannf = FALSE, nf = 3)

3. Testing the statistical significance of the BGA

The null hypothesis of the BGA is that there is no difference between farms. The test checks that the observed value of the between/total inertia ratio (0.6744) is much higher than expected under the null hypothesis. Under the null hypothesis, pools can be permuted randomly without changing significantly the between/total inertia ratio. So the rows of the dataframe are permuted at random, and the BGA is computed again. This is performed many times, to get an idea of the distribution of the between/total inertia ratio. The figure below is produced with the randtest function. It shows that the observed value (black diamond, far on the right) is much higher than all the values obtained after permutation. The p-value is equal to 0.001, which means that, if there was no difference between farms, the probability of having a between/total inertia ratio equal to 0.6744 would be less than 1 in 1000.


4. Plotting BGA graphics

The aim of BGA is to separate the groups (here, the farms). We can plot the loadings of the TTGE bands, which are in the bga1$co dataframe, with the s.label function (left panel), and to get an idea of the dispersion of the six mite pools for each farm, we plot the projection of each pool on the factor map (on the right). The row scores of the pools are in the bga1$ls dataframe, and we superimpose two graphs: the graph of pool stars (s.class), and the graph of convex hulls (s.chull) surrounding the pools belonging to the same farm. We can see that, as the permutation test had just evidenced, the farms are very different.


5. Plotting WGA graphics

The third code snippet draws the following figure, showing the factor maps of WGA and the dendrogram of the cluster analysis on TTGE band loadings. The loadings of the 55 TTGE bands are in the wga1$co dataframe, and they are plotted with the s.label function (top-left panel). The 73 mite pools are grouped by convex hulls, according to the poultry farm from which they originate (top-right panel). The row scores of the WGA are centered by group, so the 13 farms are centered on the origin (this corresponds to the fact that the "farm effect" has been removed in this analysis). The TTGE bands were selected on this figure, using cluster analysis (lower panel), with the complete link algorithm and euclidean distance (default values of the dist and hclust functions). Note that these distances are computed on WGA variable loadings (wga1$co) and not on raw data.

6. Using the vegan package

The fourth code snippet redoes the same computations using the rda function of the vegan package. The first panel is the rda display, showing the farm effect, and the second panel is the partial rda display (note that the sign of the first axis is inverted compared to the ade4 outputs). BGA is done by passing the "tab1~farms" formula to the rda function, and WGA uses vegan special function "Condition" to remove ("partial out") the effect of the farms factor.


If you have any problems or comments, please contact Jean Thioulouse.