Community ecology in the age of multivariate multiscale spatial analysis

On-line reproduction of the paper by
S. Dray, R. Pelissier, P. Couteron, M. Fortin, P. Legendre, P.R. Peres-Neto, E. Bellier, R. Bivand, F.G. Blanchet,
M. De Caceres, A. Dufour, E. Heegaard, T. Jombart, F. Munoz, J. Oksanen, J. Thioulouse, and H.H. Wagner. (PDF file)*.

This web page allows to redo all the computations and graphical displays thanks to the Rweb system.
You can play with the R code and click the "Do it again !" buttons.
The full R code is available here: allCode.R.
The pcw R dataset is available here: pcw.RData.

*S. Dray, R. Pelissier, P. Couteron, M. Fortin, P. Legendre, P.R. Peres-Neto, E. Bellier, R. Bivand, F.G. Blanchet, M. De Caceres, A. Dufour, E. Heegaard, T. Jombart, F. Munoz, J. Oksanen, J. Thioulouse, and H.H. Wagner (2012). Community ecology in the age of multivariate multiscale spatial analysis. Ecological Monographs, (accepted). (PDF file).

Abstract

Species spatial distributions are the result of population demography, behavioral traits, and species interactions in spatially heterogeneous environmental conditions. Hence the composition of species assemblages is an integrative response variable and its variability can be explained by the complex interplay among several structuring factors. The thorough analysis of spatial variation in species assemblages may help infer processes shaping ecological communities. We suggest that ecological studies would benefit from the combined use of the classical statistical models of community composition data, such as constrained or unconstrained multivariate analyses of site-by-species abundance tables, with rapidly emerging and diversifying methods of spatial pattern analysis. Doing so allows one to deal with spatially explicit ecological models of beta diversity in a biogeographic context through the multiscale analysis of spatial patterns in original species data tables, including spatial characterization of fitted or residual variation from environmental models. We summarize here the recent progress for specifying spatial features through spatial weighting matrices and spatial eigenfunctions in order to define spatially constrained or scale-explicit multivariate analyses. Through a worked example on tropical tree communities, we also show the potential of the overall approach to identify significant residual spatial patterns that could arise from the omission of important unmeasured explanatory variables or processes.

Key words: ecological community; multivariate spatial data; ordination; spatial autocorrelation; spatial connectivity; spatial eigenfunction; spatial structure; spatial weight

1. Introduction

The paper by Dray et al. (2012) uses a data set from Pyke et al. 2001 on the floristic dissimilarity between forests plots along the Panama Canal. Condit et al. (2002) used a model of distance decay in similarity to illustrate the fact that the floristic dissimilarity decreases with distance as a result of limited dispersion and habitat effects.

This reproducibility page allows to redo the computations and graphical displays of the corresponding sections in the paper by Dray et al (2012) using the R software. Computations include classical multivariate analysis methods: Principal Component Analysis (PCA) of species abundances, Redundancy Analysis (RDA) and partial RDA with environmental variables. Spatial methods include the computation of Moran Eigenvector Maps (MEM) of sampling sites, a forward selection step of these MEMs and a variation partitioning procedure to analyse the respective parts of variation due to environmental and spatial fine and broad scales. The reproducibility of graphs include figures 4 (geographical maps of multivariate analyses site scores and scalograms) and 5 (variation partitioning diagram).

2. Worked example: floristic composition along the Panama canal

Figure 1 (reprinted from Figure 4 in Dray et al. 2012) shows the maps along the Panama Canal of the site scores on the first and second axes of three analyses of the original table, with superimposed scalograms. This figure can be redone in three steps: i) Maps of points and of the neighbouring graph, ii) Computation of MEMs and of constrained analyses, iii) Plot of geographical maps with analyses scores and smoothed scalograms.

Figure 1: Maps along the Panama Canal of the site scores on the first and second axes of the analysis of the original table (principal component analysis of Y), the approximated table F (redundancy analysis with E as predictors) and the residual table R (partial principal component analysis with E as covariables). For each score, a smoothed scalogram (the 49 Moran's eigenvector maps [MEMs] are assembled in seven groups) indicates the portion of variance (R2) explained by each spatial scale. For each scalogram, the scale corresponding to the highest R2 (in dark gray) is tested using 99 permutations of the observed values (P values are given). The 95% confidence limit is also represented by the line of plus signs. (Reprinted from Dray et al. 2012, Figure 4.)

2.1 Maps of points and neighbouring graph

The first R code excerpt loads the Panama Canal dataset in R format (pcw.RData), and defines three utility functions plotmap, orthosmooth.randtest, and plotsmooth (source code available here: functions.R).

The sp package and the plotmap function are used to draw the geographical map of the Panama Canal region (Figure 2). The spdep package is used to compute the neighbouring graph (nb1) and the corresponding listw structure, starting from the (xy) coordinates of sites and using the functions graph2nb and nb2listw. The final plot with the neighbouring graph superimposed over the geographical map is drawn with the plot.listw function of the spdep package.

Figure 2: The neigbouring graph of sampling points on the Panama Canal map.


(Takes about 10 s)

2.2 Compute the MEMs and constrained analyses

The second R code excerpt computes the MEMs with the spacemakeR package and computes the spatially constrained analyses. Note that this package is not available on CRAN but on R-Forge. It must then be installed using the following R command:

install.packages("spacemakeR", repos="http://R-Forge.R-project.org")

The initial species abundance table (Y) is stored in the pcw$spe dataframe, its approximation by environmental variables is analysed by a redundancy analysis (object rda.F), and its residual counterpart when environmental variables are factored out is analysed with an orthogonally constrained rda (pcaivortho function, resulting object rda.R).

In the first step, MEMs are computed from the lw1 object with the scores.listw function of the spacemakeR package.

The table of species pcw$spe (Y) is then standardized (chi.square method) using the decostand function of the vegan package.

A centered PCA is done on the standardized species table using the dudi.pca function of the ade4 package.

A redundancy analysis (principal component analysis on instrumental variables, pcaiv function) is used to couple the species PCA with the environmental variables stored in the pce$env dataframe (E). A permutation test is used to test the portion of community data explained by environmental variables (randtest function).

The analysis of the residuals after partialling out the environmental variables from the species table (pra.R) is computed with the pcaivortho function of the ade4 package.

This code piece starts by sourcing the start.R file, that reloads the data set and redoes the first computation steps. (start.R file available here: start.R).


(Takes about 20 s)

The same computations can also be done using vegan rda function instead of ade4 pcaiv and pcaivortho functions. The following code piece allows to compare the outputs of the two packages.


(Takes about 40 s)

2.3 Plot the geographical maps with analyses scores and smoothed scalograms

The third R code excerpt starts by sourcing the start2.R file that redoes the first computation steps. (start2.R file available here: start2.R). After this, it draws the maps along the Panama Canal of the site scores on the first (left) and second (right) axes of the three analyses: analysis of the original table (top row, principal component analysis of Y), the approximated table F (middle row, redundancy analysis with E as predictors) and the residual table R (bottom row, partial principal component analysis with E as covariables).

The par(mfrow=c(3,2)) line is used to draw groups of 6 graphs, 3 vertically and 2 horizontally. For each of the 6 elementary graphs, the plotmap function is used to draw the background geographical map, and the s.value function of the ade4 package is used to plot squares proportional to the site scores over the geographical map (black squares correspond to positive scores, while white squares correspond to negative scores).

The orthosmooth.randtest is used to compute a permutation test of the R2 coefficient between the anlyses site scores and the MEM vectors, and the plotsmooth function is used to draw the smoothed scalograms. For each score, the smoothed scalogram (49 MEMs assembled in seven groups) indicates the portion of variance (R2) explained by each spatial scale. For each scalogram, the scale corresponding to the highest R2 (in dark gray) is tested using 99 permutations of the observed values (P values are given). The 95% confidence limit is also represented by the line of plus signs.

Figure 3: Maps of the first two site scores of the three analyses (left) and the corresponding scalograms (right). The sign of some of the axes have been inverted to get the same figure as in the paper of Dray et al. 2012 (components sign is arbitrary).


(Takes about 50 s)

3. Variation partitioning

Figure 2 is reprinted from Figure 5 of Dray et al. 2012. It shows the results of the variation partitioning procedure. Authors explain that "a forward selection procedure was applied to the MEMs spatial predictors (those associated with non-significant Moran's indices were removed a priori), and 13 MEMs explaining 42.9 % of the total variation of Y were selected. These MEMs were then divided in two groups corresponding to broad scales (9 MEMs associated with a positive Moran's statistic) and fine scales (4 MEMs associated with a negative autocorrelation). Variation partitioning identified a significant pure broad-scale spatial fraction (adjusted R2 = 0.11, P = 0.005), a significant pure fine-scale spatial fraction (adjusted R2 = 0.10, P = 0.005), a slightly significant pure environmental fraction (adjusted R2 = 0.06, P = 0.049) and a fraction corresponding to broad-scale structured environment (adjusted R2 = 0.06). These results indicate prominent effects of largescale environmental drivers on the spatial structure of communities."

Figure 2: The variation partitioning diagram. (Reprinted from Dray et al. 2012, Figure 5.)

The fourth R code excerpt starts by sourcing the start2.R file, that redoes the first computation steps. (start2.R file available here: start2.R).


(Takes about 2 mn 15 s)

Forward selection also works in vegan with function ordiR2step. It gives identical results to function forward.sel of package packfor. The stopping criteria are similar as in packfor: non-significant term or exceeding adjusted R2 (vegan::RsquareAdj was added to vegan just because of this function). Factor terms would be handled differently in vegan::ordiR2step and packfor::forward.sel: in vegan they are handled as a single multi-df term, while packfor selects individual 1-df contrasts. This makes no difference in the present example as we have no factors in MEMs. This can be checked with the following R code:


(Takes about 4 mn)

Note that forward selection is unstable in this example: the fourth term (MEM47) has a permutation P-value very close to the critical target value (P = 0.05) so that computations often stop with only four terms. Replications with 9999 permutations give P-values in range 0.046..0.049, but with 999 permutations we often get P > 0.05. A number of permutations larger than 999 seems therefore necessary.

vegan::ordiR2step and packfor::forward.sel work similarly here, and both can terminate at the fourth term as they are similar functions.


If you have any problems or comments, please contact Jean Thioulouse.