Community ecology in the age of multivariate multiscale spatial analysis

On-line reproduction of the paper by
S. Dray, R. Pelissier, P. Couteron, M. Fortin, P. Legendre, P.R. Peres-Neto, E. Bellier, R. Bivand, F.G. Blanchet,
M. De Caceres, A. Dufour, E. Heegaard, T. Jombart, F. Munoz, J. Oksanen, J. Thioulouse, and H.H. Wagner. (PDF file)*.

This web page allows to redo all the computations and graphical displays thanks to the Rweb system.
You can play with the R code and click the "Do it again !" buttons.
The full R code is available here: allCode.R.
The pcw R dataset is available here: pcw.RData.
*S. Dray, R. Pelissier, P. Couteron, M. Fortin, P. Legendre, P.R. Peres-Neto, E. Bellier, R. Bivand, F.G. Blanchet, M. De Caceres, A. Dufour, E. Heegaard, T. Jombart, F. Munoz, J. Oksanen, J. Thioulouse, and H.H. Wagner (2012). Community ecology in the age of multivariate multiscale spatial analysis. Ecological Monographs, (accepted). (PDF file).

Abstract

Species spatial distributions are the result of population demography, behavioral traits, and species interactions in spatially heterogeneous environmental conditions. Hence the composition of species assemblages is an integrative response variable and its variability can be explained by the complex interplay among several structuring factors. The thorough analysis of spatial variation in species assemblages may help infer processes shaping ecological communities. We suggest that ecological studies would benefit from the combined use of the classical statistical models of community composition data, such as constrained or unconstrained multivariate analyses of site-by-species abundance tables, with rapidly emerging and diversifying methods of spatial pattern analysis. Doing so allows one to deal with spatially explicit ecological models of beta diversity in a biogeographic context through the multiscale analysis of spatial patterns in original species data tables, including spatial characterization of fitted or residual variation from environmental models. We summarize here the recent progress for specifying spatial features through spatial weighting matrices and spatial eigenfunctions in order to define spatially constrained or scale-explicit multivariate analyses. Through a worked example on tropical tree communities, we also show the potential of the overall approach to identify significant residual spatial patterns that could arise from the omission of important unmeasured explanatory variables or processes.

Key words: ecological community; multivariate spatial data; ordination; spatial autocorrelation; spatial connectivity; spatial eigenfunction; spatial structure; spatial weight

1. Introduction

The paper by Dray et al. (2012) uses a data set from Pyke et al. 2001 on the floristic dissimilarity between forests plots along the Panama Canal. Condit et al. (2002) used a model of distance decay in similarity to illustrate the fact that the floristic dissimilarity decreases with distance as a result of limited dispersion and habitat effects.

This reproducibility page allows to redo the computations and graphical displays of the corresponding sections in the paper by Dray et al (2012) using the R software. Computations include classical multivariate analysis methods: Principal Component Analysis (PCA) of species abundances, Redundancy Analysis (RDA) and partial RDA with environmental variables. Spatial methods include the computation of Moran Eigenvector Maps (MEM) of sampling sites, a forward selection step of these MEMs and a variation partitioning procedure to analyse the respective parts of variation due to environmental and spatial fine and broad scales. The reproducibility of graphs include figures 4 (geographical maps of multivariate analyses site scores and scalograms) and 5 (variation partitioning diagram).

2. Worked example: floristic composition along the Panama canal

Figure 1 (reprinted from Figure 4 in Dray et al. 2012) shows the maps along the Panama Canal of the site scores on the first and second axes of three analyses of the original table, with superimposed scalograms. This figure can be redone in three steps: i) Maps of points and of the neighbouring graph, ii) Computation of MEMs and of constrained analyses, iii) Plot of geographical maps with analyses scores and smoothed scalograms.

Figure 1: Maps along the Panama Canal of the site scores on the first and second axes of the analysis of the original table (principal component analysis of Y), the approximated table F (redundancy analysis with E as predictors) and the residual table R (partial principal component analysis with E as covariables). For each score, a smoothed scalogram (the 49 Moran's eigenvector maps [MEMs] are assembled in seven groups) indicates the portion of variance (R2) explained by each spatial scale. For each scalogram, the scale corresponding to the highest R2 (in dark gray) is tested using 99 permutations of the observed values (P values are given). The 95% confidence limit is also represented by the line of plus signs. (Reprinted from Dray et al. 2012, Figure 4.)

2.1 Maps of points and neighbouring graph

The first R code excerpt loads the Panama Canal dataset in R format (pcw.RData), and defines three utility functions plotmap, orthosmooth.randtest, and plotsmooth (source code available here: functions.R).

The sp package and the plotmap function are used to draw the geographical map of the Panama Canal region (Figure 2). The spdep package is used to compute the neighbouring graph (nb1) and the corresponding listw structure, starting from the (xy) coordinates of sites and using the functions graph2nb and nb2listw. The final plot with the neighbouring graph superimposed over the geographical map is drawn with the plot.listw function of the spdep package.

Figure 2: The neigbouring graph of sampling points on the Panama Canal map.

2.2 Compute the MEMs and constrained analyses

The second R code excerpt computes the MEMs with the spacemakeR package and computes the spatially constrained analyses. Note that this package is not available on CRAN but on R-Forge. It must then be installed using the following R command:

install.packages("spacemakeR", repos="http://R-Forge.R-project.org")

The initial species abundance table (Y) is stored in the pcw$spe dataframe, its approximation by environmental variables is analysed by a redundancy analysis (object rda.F), and its residual counterpart when environmental variables are factored out is analysed with an orthogonally constrained rda (pcaivortho function, resulting object rda.R).

In the first step, MEMs are computed from the lw1 object with the scores.listw function of the spacemakeR package.

The table of species pcw$spe (Y) is then standardized (chi.square method) using the decostand function of the vegan package.

A centered PCA is done on the standardized species table using the dudi.pca function of the ade4 package.

A redundancy analysis (principal component analysis on instrumental variables, pcaiv function) is used to couple the species PCA with the environmental variables stored in the pce$env dataframe (E). A permutation test is used to test the portion of community data explained by environmental variables (randtest function).

The analysis of the residuals after partialling out the environmental variables from the species table (pra.R) is computed with the pcaivortho function of the ade4 package.

This code piece starts by sourcing the start.R file, that reloads the data set and redoes the first computation steps. (start.R file available here: start.R).

## redo first step computations
source("/panhome/jthioulo/HomePage/EcolMonogr/start.R")

## compute the MEMs
library(spacemakeR, lib.loc="/panhome/jthioulo/Rlib") ## installed using install.packages("spacemakeR", repos="http://R-Forge.R-project.org")
U <- scores.listw(lw1)

## Create original, fitted and residual tables
library(vegan, lib.loc="/panhome/jthioulo/Rlib")  ## installed using install.packages("vegan")
library(ade4, lib.loc="/panhome/jthioulo/Rlib") ## installed using install.packages("ade4")
tab <- decostand(pcw$spe, method="chi.square")

## Analysis of Y
pca.Y <- dudi.pca(tab, scale = F, scannf = F)
pca.Y$eig[1:2] / sum(pca.Y$eig) * 100 ## percentage of variations explained by the first two axes

## RDA (Analysis of F)
rda.F <- pcaiv(pca.Y, pcw$env, scannf = F)
testR2.rda.F <- randtest(rda.F, nrepet = 99)
testR2.rda.F # test of the R2 (portion of community data explained by environmental variables)
rda.F$eig[1:2] / sum(rda.F$eig) * 100 ## percentage of variations explained by the first two axes
rda.F$cor ## correlation with environmental variables

## PRA (Analysis of R)
pra.R <- pcaivortho(pca.Y, pcw$env, scannf = F)
pra.R$eig[1:2] / sum(pra.R$eig) * 100 ## percentage of variations explained by the first two axes

(Takes about 20 s)

The same computations can also be done using vegan rda function instead of ade4 pcaiv and pcaivortho functions. The following code piece allows to compare the outputs of the two packages.

## redo first step computations
source("/panhome/jthioulo/HomePage/EcolMonogr/start.R")
library(spacemakeR, lib.loc="/panhome/jthioulo/Rlib") ## installed using install.packages("spacemakeR", repos="http://R-Forge.R-project.org")
U <- scores.listw(lw1)
library(vegan, lib.loc="/panhome/jthioulo/Rlib")  ## installed using install.packages("vegan")
library(ade4, lib.loc="/panhome/jthioulo/Rlib") ## installed using install.packages("ade4")
tab <- decostand(pcw$spe, method="chi.square")
pca.Y <- dudi.pca(tab, scale = F, scannf = F)
rda.F <- pcaiv(pca.Y, pcw$env, scannf = F)
testR2.rda.F <- randtest(rda.F, nrepet = 99)
testR2.rda.F # test of the R2 (portion of community data explained by environmental variables)
rda.F$cor ## correlation with environmental variables

### >>> vegan piece 1 <<<

### The same in vegan can be achieved with function rda(). This finds
### directly both the constrained result (similar to rda.F above) and
### the residual ordination (similar to pra.R of pcaivortho()) above),
### and works on raw data so that previous PCA is unnecessary. So we
### proceed with 'tab'
rda.F.R.veg <- rda(tab ~ ., pcw$env)
### The eigenvalues are different because vegan uses unbiased
### variances and ade4 biased variances, but the proportions match
### (vegan::eigenvals() also knows ade4)
summary(eigenvals(rda.F))
summary(eigenvals(rda.F.R.veg, constrained = TRUE))
## Identical eigenvalues are obtained after bias correction in ade4
## (or de-correction in vegan):
eigenvals(rda.F) * NROW(tab)/(NROW(tab) - 1)
### Permutation tests in vegan give similar results as in ade4
anova(rda.F.R.veg, step = 1000)
## Similar items are stored in vegan and ade4 functions, but in vegan
## we prefer to have accessor functions instead of directly accessing
## lists with $-signs. The following finds the correlations with
## environmental variables (that we call biplot scores), or rda.F$cor
## of ade4
scores(rda.F.R.veg, display="bp")

### >>> end vegan piece 1 <<<

(Takes about 40 s)

2.3 Plot the geographical maps with analyses scores and smoothed scalograms

The third R code excerpt starts by sourcing the start2.R file that redoes the first computation steps. (start2.R file available here: start2.R). After this, it draws the maps along the Panama Canal of the site scores on the first (left) and second (right) axes of the three analyses: analysis of the original table (top row, principal component analysis of Y), the approximated table F (middle row, redundancy analysis with E as predictors) and the residual table R (bottom row, partial principal component analysis with E as covariables).

The par(mfrow=c(3,2)) line is used to draw groups of 6 graphs, 3 vertically and 2 horizontally. For each of the 6 elementary graphs, the plotmap function is used to draw the background geographical map, and the s.value function of the ade4 package is used to plot squares proportional to the site scores over the geographical map (black squares correspond to positive scores, while white squares correspond to negative scores).

The orthosmooth.randtest is used to compute a permutation test of the R2 coefficient between the anlyses site scores and the MEM vectors, and the plotsmooth function is used to draw the smoothed scalograms. For each score, the smoothed scalogram (49 MEMs assembled in seven groups) indicates the portion of variance (R2) explained by each spatial scale. For each scalogram, the scale corresponding to the highest R2 (in dark gray) is tested using 99 permutations of the observed values (P values are given). The 95% confidence limit is also represented by the line of plus signs.

Figure 3: Maps of the first two site scores of the three analyses (left) and the corresponding scalograms (right). The sign of some of the axes have been inverted to get the same figure as in the paper of Dray et al. 2012 (components sign is arbitrary).

(Takes about 50 s)

3. Variation partitioning

Figure 2 is reprinted from Figure 5 of Dray et al. 2012. It shows the results of the variation partitioning procedure. Authors explain that "a forward selection procedure was applied to the MEMs spatial predictors (those associated with non-significant Moran's indices were removed a priori), and 13 MEMs explaining 42.9 % of the total variation of Y were selected. These MEMs were then divided in two groups corresponding to broad scales (9 MEMs associated with a positive Moran's statistic) and fine scales (4 MEMs associated with a negative autocorrelation). Variation partitioning identified a significant pure broad-scale spatial fraction (adjusted R2 = 0.11, P = 0.005), a significant pure fine-scale spatial fraction (adjusted R2 = 0.10, P = 0.005), a slightly significant pure environmental fraction (adjusted R2 = 0.06, P = 0.049) and a fraction corresponding to broad-scale structured environment (adjusted R2 = 0.06). These results indicate prominent effects of largescale environmental drivers on the spatial structure of communities."

Figure 2: The variation partitioning diagram. (Reprinted from Dray et al. 2012, Figure 5.)

The fourth R code excerpt starts by sourcing the start2.R file, that redoes the first computation steps. (start2.R file available here: start2.R).

The test.scores function of the spacemakeR package is used to test the significance of MEMs Moran's index (with the moran.mc function of the spdep package). The MEMs with significant p-values (p<0.05) are stored in U.sel$vectors.
RDA (rda function of the vegan package) is applied on standardized species abundances and the selected MEMs to compute the threshold R2 that is used in the forward selection.
The forward.sel function of the packfor package is used to apply a forward selection procedure to the significant MEMs. Vectors corresponding to broad scale spatial structures (positive eigenvalues) and to fine scale spatial structures (negative eigenvalues) are selected and stored in broad and fine dataframes.
As environmental variables are both quantitative and qualitative, a Hill and Smith analysis is done on the environmental data table, using the dudi.hillsmith function of the ade4 package, and the variation partitioning is done using the varpart function of the vegan package. Note that you can handle mixed quantitative and categorical (factor) variables in vegan when the terms are given in formula. This gives the same result as with Hill-Smith scaling. The calculation of hs1 is unnecessary and the results below (pc.varpart.vegan) are similar to pc.varpart with Hill-Smith contrasts.
Lastly, pure effects are tested with the anova generic function on the corresponding rda objects (i.e., the internal anova.cca function of vegan).

## redo first and second steps computations
source("/panhome/jthioulo/HomePage/EcolMonogr/start2.R")

## test the significance of Moran's I for MEMs
mctests <- test.scores(U, lw1, 99)
colnames(U$vectors) <- paste("MEM", 1:ncol(U$vectors), sep="")
U.sel <- U
U.sel$vectors <- U.sel$vectors[, mctests[,2]<0.05] ## keep only MEMS with signif values

## forward selection
library(packfor, lib.loc="/panhome/jthioulo/Rlib") ## install.packages("packfor", repos="http://R-Forge.R-project.org")

rda1 <- rda(tab, U.sel$vectors)
rda1.R2a <- RsquareAdj(rda1)$adj.r.squared
rda1.fw <- forward.sel(tab,U.sel$vectors, adjR2thresh = rda1.R2a)

### varpart
U.sel.fw <- colnames(U$vectors)%in%rda1.fw$variables
val.sel <- U.sel$values[rda1.fw$order]
vec.sel <- U.sel$vectors[,rda1.fw$order]
broad <- vec.sel[,val.sel>0]
fine <- vec.sel[,val.sel<0]
colnames(broad)
# [1] "MEM 1"  "MEM 10" "MEM 2"  "MEM 4"  "MEM 5"  "MEM 7"  "MEM 8"  "MEM 13"
#[9] "MEM 3"
colnames(fine)
# [1] "MEM 47" "MEM 48" "MEM 39" "MEM 38"
hs1 <- dudi.hillsmith(pcw$env, scannf = F, nf = 50) ## to use varpart with mix of quantitative and categorical variables
pc.varpart <- varpart(tab,hs1$li, broad, fine)
labv <- c("Environment","Spatial broad","Spatial fine","Env\n+Broad","Broad\n+Fine","Env\n+Fine","Env+Fine\nBroad","")
showvarparts(3, labv)
plot(pc.varpart)

## You *can* handle mixed quantitative and categorical (factor)
## variables in vegan: the following gives the same result as with
## Hill-Smith scaling. The calculation of hs1 is unnecessary and the
## results below ('pc.varpart.vegan') are similar to 'pc.varpart' with
## Hill-Smith contrasts.
pc.varpart.vegan <- varpart(tab, ~ ., broad, fine, data=pcw$env)
plot(pc.varpart.vegan)

anova(rda(tab,hs1$li, cbind(broad, fine)), perm.max = 999) ## test pure envt (signif 0.04)
anova(rda(tab,broad, cbind(hs1$li, fine)), perm.max = 999) ## test pure broad (signif 0.005)  
anova(rda(tab,fine, cbind(hs1$li, broad)), perm.max = 999) ## test pure fine (signif 0.005)

(Takes about 2 mn 15 s)

Forward selection also works in vegan with function ordiR2step. It gives identical results to function forward.sel of package packfor. The stopping criteria are similar as in packfor: non-significant term or exceeding adjusted R2 (vegan::RsquareAdj was added to vegan just because of this function). Factor terms would be handled differently in vegan::ordiR2step and packfor::forward.sel: in vegan they are handled as a single multi-df term, while packfor selects individual 1-df contrasts. This makes no difference in the present example as we have no factors in MEMs. This can be checked with the following R code:

## redo first and second steps computations
source("/panhome/jthioulo/HomePage/EcolMonogr/start2.R")

#### >>> vegan piece 2 <<<

## vegan::ordiR2step requires terms to be in a data frame (matrix does not work)
Usel.frame <- as.data.frame(U.sel$vectors)
rda.0 <- rda(tab ~ 1, Usel.frame)
rda.full <- rda(tab ~ ., Usel.frame)
## Terms are selected in order MEM 1, 10, 2, 47, ... but the p-value
## of MEM 47 is very close to the targeted critical value p = 0.05
## (seems to be about 0.047 from 10000 permutations), and in some
## cases the selection may stop there (this concerns both
## packfor::forward.sel and vegan::ordiR2step similarly; the latter
## defaults to 999 permutations). The high value of 'perm.max'
## argument gives some protection against stopping too light-mindedly.
rda1.fw.vegan <- ordiR2step(rda.0, formula(rda.full), perm.max = 999)

### >>> end vegan piece 2 <<<

(Takes about 4 mn)

Note that forward selection is unstable in this example: the fourth term (MEM47) has a permutation P-value very close to the critical target value (P = 0.05) so that computations often stop with only four terms. Replications with 9999 permutations give P-values in range 0.046..0.049, but with 999 permutations we often get P > 0.05. A number of permutations larger than 999 seems therefore necessary.

vegan::ordiR2step and packfor::forward.sel work similarly here, and both can terminate at the fourth term as they are similar functions.

If you have any problems or comments, please contact Jean Thioulouse.