Diversity measures

2019-03-19

Thereafter, we denote by:

• $$S$$ the total number of taxa recorded,
• $$i$$ the rank of the taxon
• $$n_i$$ the number of individuals in the $$i$$-th taxon,
• $$N = \sum n_i$$ the total number of individuals,
• $$p_i$$ the relative proportion of the $$i$$-th taxon in the population,
• $$s_n$$ the number of species with $$n$$ individuals
• $$f_k$$ the number of taxa with $$k$$ individuals,
• $$Q_i$$ the incidence of the $$i$$-th taxon.

$$\alpha$$-diversity

Richness

Margalef index (Margalef 1958)

$D_{Mg} = \frac{S - 1}{\ln N}$

Menhinick index (Menhinick 1964)

$D_{Mn} = \frac{S}{\sqrt{N}}$

Chao1 estimator (Chao 1984)

$\hat{S}_{Chao1} = \begin{cases} S + \frac{N - 1}{N} \frac{f_1^2}{2 f_2} & f_2 > 0 \\ S + \frac{N - 1}{N} \frac{f_1 (f_1 - 1)}{2} & f_2 = 0 \end{cases}$

In the special case of homogeneous case, a bias-corrected estimator is:

$\hat{S}_{Chao1-bc} = S + \frac{N - 1}{N} \frac{f_1 (f_1 - 1)}{2 f_2 + 1}$

Improved Chao1 estimator (makes use of the additional information of tripletons and quadrupletons; Chiu et al. (2014)) :

$\hat{S}_{iChao1} = \hat{S}_{Chao1} + \frac{N - 3}{4 N} \frac{f_3}{f_4} \times \max\left(f_1 - \frac{N - 3}{N - 1} \frac{f_2 f_3}{2 f_4} , 0\right)$

Abundance-based Coverage Estimator (Chao and Lee 1992)

$\hat{S}_{ACE} = \hat{S}_{abun} + \frac{\hat{S}_{rare}}{\hat{C}_{rare}} + \frac{f_1}{\hat{C}_{rare}} \times \hat{\gamma}^2_{rare}$

Where $$\hat{S}_{rare} = \sum_{i = 1}^{k} f_i$$ is the number of rare taxa, $$\hat{S}_{abun} = \sum_{i > k}^{N} f_i$$ is the number of abundant taxa (for a given cut-off value $$k$$), $$\hat{C}_{rare} = 1 - \frac{f_1}{N_{rare}}$$ is the Turing’s coverage estimate and:

$\hat{\gamma}^2_{rare} = \max\left[\frac{\hat{S}_{rare}}{\hat{C}_{rare}} \frac{\sum_{i = 1}^{k} i(i - 1)f_i}{\left(\sum_{i = 1}^{k} if_i\right)\left(\sum_{i = 1}^{k} if_i - 1\right)} - 1, 0\right]$

Chao2 estimator (Chao 1987)

For replicated incidence data (i.e. a $$m \times p$$ logical matrix), the Chao2 estimator is:

$\hat{S}_{Chao2} = \begin{cases} S + \frac{m - 1}{m} \frac{Q_1^2}{2 Q_2} & Q_2 > 0 \\ S + \frac{m - 1}{m} \frac{Q_1 (Q_1 - 1)}{2} & Q_2 = 0 \end{cases}$

Improved Chao2 estimator (Chiu et al. 2014):

$\hat{S}_{iChao2} = \hat{S}_{Chao2} + \frac{m - 3}{4 m} \frac{Q_3}{Q_4} \times \max\left(Q_1 - \frac{m - 3}{m - 1} \frac{Q_2 Q_3}{2 Q_4} , 0\right)$

Incidence-based Coverage Estimator (Chao and Chiu 2016)

$\hat{S}_{ICE} = \hat{S}_{freq} + \frac{\hat{S}_{infreq}}{\hat{C}_{infreq}} + \frac{Q_1}{\hat{C}_{infreq}} \times \hat{\gamma}^2_{infreq}$

Where $$\hat{S}_{infreq} = \sum_{i = 1}^{k} Q_i$$ is the number of infrequent taxa, $$\hat{S}_{freq} = \sum_{i > k}^{N} Q_i$$ is the number of frequent taxa (for a given cut-off value $$k$$), $$\hat{C}_{infreq} = 1 - \frac{Q_1}{\sum_{i = 1}^{k} iQ_i}$$ is the Turing’s coverage estimate and:

$\hat{\gamma}^2_{infreq} = \max\left[\frac{\hat{S}_{infreq}}{\hat{C}_{infreq}} \frac{m_{infreq}}{m_{infreq} - 1} \frac{\sum_{i = 1}^{k} i(i - 1)Q_i}{\left(\sum_{i = 1}^{k} iQ_i\right)\left(\sum_{i = 1}^{k} iQ_i - 1\right)} - 1, 0\right]$

Where $$m_{infreq}$$ is the number of sampling units that include at least one infrequent species.

Rarefaction

Hurlbert (1971) unbiaised estimate of Sander (1968) rarefaction:

$E(S) = \sum_{i = 1}^{S} 1 - \frac{{N - N_i} \choose n}{N \choose n}$

Diversity and evenness

Information theory index

Shannon-Wiener diversity index (Shannon 1948)

Diversity:

$H' = - \sum_{i = 1}^{S} p_i \ln p_i$

Evenness:

$E = \frac{H'}{H'_{max}} = \frac{H'}{\ln S} = - \sum_{i = 1}^{S} p_i \log_S p_i$

When $$p_i$$ is unknown in the population, an estimate is given by $$\hat{p}_i =\frac{n_i}{N}$$ (maximum likelihood estimator - MLE). As the use of $$\hat{p}_i$$ results in a biased estimate, Hutcheson (1970) and Bowman et al. (1971) suggest the use of:

$\hat{H}' = - \sum_{i = 1}^{S} \hat{p}_i \ln \hat{p}_i - \frac{S - 1}{N} + \frac{1 - \sum_{i = 1}^{S} \hat{p}_i^{-1}}{12N^2} + \frac{\sum_{i = 1}^{S} (\hat{p}_i^{-1} - \hat{p}_i^{-2})}{12N^3} + \cdots$

This error is rarely significant (Peet 1974), so the unbiaised form is not implemented here (for now).

Brillouin diversity index (Brillouin 1956)

Diversity:

$HB = \frac{\ln (N!) - \sum_{i = 1}^{S} \ln (n_i!)}{N}$

Evenness:

$E = \frac{HB}{HB_{max}}$

with:

$HB_{max} = \frac{1}{N} \ln \frac{N!}{\left( \lfloor \frac{N}{S} \rfloor! \right)^{S - r} \left[ \left( \lfloor \frac{N}{S} \rfloor + 1 \right)! \right]^{r}}$

where: $$r = N - S \lfloor \frac{N}{S} \rfloor$$.

Dominance index

The following methods return a dominance index, not the reciprocal or inverse form usually adopted, so that an increase in the value of the index accompanies a decrease in diversity.

Simpson index (Simpson 1949)

Dominance for an infinite sample:

$D = \sum_{i = 1}^{S} p_i^2$

Dominance for a finite sample:

$\lambda = \sum_{i = 1}^{S} \frac{n_i \left( n_i - 1 \right)}{N \left( N - 1 \right)}$

McIntosh index (McIntosh 1967)

Dominance:

$D = \frac{N - U}{N - \sqrt{N}}$

Evenness:

$E = \frac{N - U}{N - \frac{N}{\sqrt{S}}}$

where $$U$$ is the distance of the sample from the origin in an $$S$$ dimensional hypervolume:

$U = \sqrt{\sum_{i = 1}^{S} n_i^2}$

Berger-Parker index (Berger and Parker 1970)

Dominance:

$d = \frac{n_{max}}{N}$

$$\beta$$-diversity

Turnover

The following methods can be used to acertain the degree of turnover in taxa composition along a gradient on qualitative (presence/absence) data. This assumes that the order of the matrix rows (from 1 to $$m$$) follows the progression along the gradient/transect.

We denote the $$m \times p$$ incidence matrix by $$X = \left[ x_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$$ and the $$p \times p$$ corresponding co-occurrence matrix by $$Y = \left[ y_{ij} \right] ~\forall i,j \in \left[ 1,p \right]$$, with row and column sums:

\begin{align} x_{i \cdot} = \sum_{j = 1}^{p} x_{ij} && x_{\cdot j} = \sum_{i = 1}^{m} x_{ij} && x_{\cdot \cdot} = \sum_{j = 1}^{p} \sum_{i = 1}^{m} x_{ij} && \forall x_{ij} \in \lbrace 0,1 \rbrace \\ y_{i \cdot} = \sum_{j \geqslant i}^{p} y_{ij} && y_{\cdot j} = \sum_{i \leqslant j}^{p} y_{ij} && y_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} y_{ij} && \forall y_{ij} \in \lbrace 0,1 \rbrace \end{align}

Whittaker measure (Whittaker 1960)

$\beta_W = \frac{S}{\alpha} - 1$

where $$\alpha$$ is the mean sample diversity: $$\alpha = \frac{x_{\cdot \cdot}}{m}$$.

Cody measure (Cody 1975)

$\beta_C = \frac{g(H) + l(H)}{2} - 1$

where $$g(H)$$ is the number of taxa gained along the transect and $$l(H)$$ the number of taxa lost.

Routledge measures (Routledge 1977)

\begin{align} \beta_R &= \frac{S^2}{2 y_{\cdot \cdot} + S} - 1 \\ \beta_I &= \log x_{\cdot \cdot} - \frac{\sum_{j = 1}^{p} x_{\cdot j} \log x_{\cdot j}}{x_{\cdot \cdot}} - \frac{\sum_{i = 1}^{m} x_{i \cdot} \log x_{i \cdot}}{x_{\cdot \cdot}} \\ \beta_E &= \exp(\beta_I) - 1 \end{align}

Wilson and Shmida measure (Wilson and Shmida 1984)

$\beta_T = \frac{g(H) + l(H)}{2\alpha}$

where $$g(H)$$ is the number of taxa gained along the transect, $$l(H)$$ the number of taxa lost and $$\alpha$$ the mean sample diversity, $$\alpha = \frac{x_{\cdot \cdot}}{m}$$.

Similarity

Similarity between two samples $$a$$ and $$b$$ or between two types $$x$$ and $$y$$ can be measured as follow.

These indices provide a scale of similarity from $$0$$-$$1$$ where $$1$$ is perfect similarity and $$0$$ is no similarity, with the exception of the Brainerd-Robinson index which is scaled between $$0$$ and $$200$$.

$$a_j$$ and $$b_j$$ denote the number of individuals in the $$j$$-th type/taxon, $$j \in \left[ 1,n \right]$$. $$o_j$$ denotes the number of type/taxon common to both sample/case: $$o_j = \sum_{k = 1}^{n} a_k \cap b_k$$.

$$x_i$$ and $$y_i$$ denote the number of individuals in the $$i$$-th sample/case, $$i \in \left[ 1,m \right]$$. $$o_i$$ denotes the number of sample/case common to both type/taxon: $$o_i = \sum_{k = 1}^{m} x_k \cap y_k$$.

Qualitative indices

Jaccard index

$C_J = \frac{o_j}{S_a + S_b - o_j}$

Sorenson index

$C_S = \frac{2 \times o_j}{S_a + S_b}$

Quantitative index

Brainerd-Robinson index (Brainerd 1951, Robinson (1951))

$C_{BR} = 200 - \sum_{j = 1}^{S} \left| \frac{a_j \times 100}{\sum_{j = 1}^{S} a_j} - \frac{b_j \times 100}{\sum_{j = 1}^{S} b_j} \right|$

Sorenson index

Bray and Curtis (1957) modified version of Sorenson’s index:

$C_N = \frac{2 \sum_{j = 1}^{S} \min(a_j, b_j)}{N_a + N_b}$

Morisita-Horn index

$C_{MH} = \frac{2 \sum_{j = 1}^{S} a_j \times b_j}{(\frac{\sum_{j = 1}^{S} a_j^2}{N_a^2} + \frac{\sum_{j = 1}^{S} b_j^2}{N_b^2}) \times N_a \times N_b}$

Binomial co-occurrence (Kintigh 2006)

$C_{Bi} = \frac{o_i - N \times p}{\sqrt{N \times p \times (1 - p)}}$