Strauss estimates prey preferences using his statistic \(L_{st}\) (Strauss 1979). \(L_{st}\) is the difference between \(r_{st}\) the proportion of prey species \(s\) found in the gut of a predator during occurrence \(t\) and \(p_{st}\) the proportion of prey species \(s\) found in the habitat of said predator during occurrence \(t\). When the statistic \(L_{st} := r_{st} - p_{st}\) is equal to zero we say the predator ate prey species \(s\) randomly during time \(t\). A positive difference, \(L_{st} > 0\), indicates preferential eating of prey species \(s\), and negative values indicate aversion to prey species \(s\) at time \(t\). We build upon Strauss’s \(L\) with the following framework.

Let \(X_{jst} \sim \mathcal{P}(\lambda_{st})\) denote the number of prey species \(s\) found in the gut of predator \(j\) during occurrence \(t\); \(j \in \{1, \ldots, J_t\}, s \in \{1, \ldots, S\}, t \in \{1, \ldots, T\}\). Let \(Y_{ist} \sim \mathcal{P}(\gamma_{st})\) will denote the number of prey species \(s\) found in trap \(i\), \(i \in \{1, \ldots, I_t \}\), hypothesized to represent the habitat of predator \(j\), during time occurrence \(t\). From the observations \(x_{jst}\) and \(y_{ist}\) we are able to derive similar conclusions to that of Strauss’s \(L\), but now more efficiently taking into account multiple prey species across an array of time points.

The relative rate with which a predator ate species \(s\) during time \(t\) to the rate at which prey species \(s\) was available in the habitat during time \(t\) is represented as the fraction of the respecitve parameters, \(c_{st} = \lambda_{st} / \gamma_{st}\). The quantity \(c_{st}\) is then our analogue to Strauss’s statistic \(L\).

We allow \(c_{st}\) to naturally vary by prey species, time, neither, or both. We depict these four hypotheses in the following list. Please see our paper (Roualdes et al. 2016) for further details about the interpretations of these hypotheses.

- \(c_{st} = c\)
- \(c_{st} = c_s\)
- \(c_{st} = c_t\)
- \(c_{st} = c_{st}\)

These four cases are built into the null hypothesis \(H_0: \boldsymbol{\lambda} = \boldsymbol{c} \boldsymbol{\gamma}\), where \(\boldsymbol{c}\) takes on any of the \(1. - 3.\) values of \(c_{st}\). Since the \(4\) potential hypotheses are nested, \(H_0\) can be contrasted against an alternative that contains the null as a special case. For instance, one could test \[H_0: \lambda_t = c_t \gamma_t, \forall t \quad \text{ verse } \quad H_1: \lambda_{st} = c_{st} \gamma_{st}.\]

A likelihood ratio test statistic (Wilks 1938) evaluates a given null and alternative hypothesis \[\Lambda := -2 \log{ \frac{ \sup L(\theta_0|X,Y)}{ \sup L(\theta_1|X,Y)}},\] where \(\theta_0, \theta_1\) represent the parameters under the null and alternative hypotheses, respectively. The test statistic \(\Lambda \dot{\sim} \chi^2_{\rho}\), where \(\rho\) represent the number of free parameters. The null hypothesis is rejected when \(P(\chi^2_{\rho} > \Lambda) < \alpha\).

With smaller animals it is not always possible to observe a count of prey species eaten by a predator species. Instead, a binary response of whether or not DNA of said prey species exists in the gut of the predator is observed, via for instance a molecular gut content analysis. In this case, we can treat the count of eaten prey species as missing and maximize the likelihood using an expectation-maximization (EM) algorithm (Dempster, Laird, and Rubin 1977).

We model what information we do observe. Denote this binary response by \(Z_{jst} = 1(X_{jst} > 0)\), which takes on the value \(1\) if the \(j^{th}\) predator ate prey species \(s\) in time \(t\) and \(0\) otherwise. Under this new set up \(\Lambda\) is calculated with the observed data likelihood, \(L(\theta|Z,Y)\).

Our package assumes two data sets, one for the predators (and what they ate) and one for the caught prey species. Each data set will include a new observation per row; recall that the units of observation for each dataset are different. The column names of both data sets should match. One column, named `time`

, should include the time/date for which each observation was recorded. The other columns should be named for the prey species of interest, e.g.

`head(Traps)`

```
## time preySpecies1 preySpecies2 preySpecies3 adj
## 1 time1 1 3 3 1
## 2 time1 1 0 1 1
## 3 time1 1 0 1 1
## 4 time1 1 2 1 1
## 5 time1 1 2 0 1
## 6 time1 0 0 2 1
```

The column `adj`

accounts for unequal trap schedules. Consider the situtation where the traps (catching prey species) were left out for differing lengths of time within occurrence \(t\), e.g. trap \(i\) in month \(t\) was left out for \(6\) days, but trap \(i'\) in month \(t\) was left out for \(3\) days. We expect to catch an unequal number of prey in each trap simply because one trap was left out for a longer time within month \(t\). The user can specify this by creating a column, which has to be named `adj`

, that contains the number of units of time for which each trap was left out.

`tail(Traps)`

```
## time preySpecies1 preySpecies2 preySpecies3 adj
## 930 time5 3 0 0 1
## 931 time5 2 0 1 1
## 932 time5 2 2 0 1
## 933 time5 0 2 2 1
## 934 time5 1 2 3 1
## 935 time5 4 3 0 1
```

The variable `time`

must be sorted so that summation over occurrences \(t\) for both the predators data set and the caught prey species data set will contain an equal number of time periods \(T\), and the `time`

s of such a summed dataframe will match. That is, at least, you should get

`all.equal(unique(Traps[,'time']), unique(Spiders[,'time']))`

`## [1] TRUE`

Our package does not check for matching units of time, but it does check for an equal number of units of time across data sets.

Given two dataframes formatted as described above, one representing the predators’ eatings `Spiders`

and one representing the availability of the prey species of interest in the habitat `Traps`

, fit our method by calling

`prefs <- predPref(Spiders, Traps, hypotheses=c(null="C", alt="Cst"))`

An `S3`

summary method is available to summarize, with a user defined significance level, the results of the previous call.

`summary(prefs)`

```
##
## Predator Preferences Model:
##
## Hypotheses:
## H0 = c
## H1 = Cst
## Likelihood Ratio Test:
## -2*(llH0-llH1) = 12.04303 on 14 degrees of freedom
## p-value = 0.6028473
## Parameter Estimates:
## estimate Std. Err.
## c_1 0.47554 0.0131
## gamma_1_1 1.13364 0.1186
## gamma_2_1 1.56492 0.0992
## gamma_3_1 1.60188 0.0824
## gamma_4_1 1.47250 0.0686
## gamma_5_1 1.38712 0.0509
## gamma_1_2 1.29383 0.1268
## gamma_2_2 1.36776 0.0926
## gamma_3_2 1.48688 0.0793
## gamma_4_2 1.58956 0.0714
## gamma_5_2 1.50859 0.0533
## gamma_1_3 1.30615 0.1274
## gamma_2_3 1.41705 0.0943
## gamma_3_3 1.42116 0.0774
## gamma_4_3 1.35236 0.0657
## gamma_5_3 1.43641 0.0519
```

Often with real world data, the alternative hypothesis “Cst” is found to be more likely. In this case, more detail about the predator’s prey preferences is elucidated by calculating linear contrasts of the estimated values \(c_{st}\). Let’s first remind ourselves the dimensions of the estimated vector \(\mathbf{c}\) by extracting the elements of this vector \(c_{st}\) from the fitted object `prefs`

`prefs$alt$c`

```
## preySpecies11 preySpecies12 preySpecies13 preySpecies14 preySpecies15
## 0.4153849 0.4853809 0.4181821 0.5078875 0.4021354
## preySpecies21 preySpecies22 preySpecies23 preySpecies24 preySpecies25
## 0.5909119 0.5416683 0.5338998 0.4659097 0.5168153
## preySpecies31 preySpecies32 preySpecies33 preySpecies34 preySpecies35
## 0.4722228 0.4838717 0.4537820 0.4488453 0.4945064
```

`length(prefs$alt$c)`

`## [1] 15`

To calculate a linear contrast we need create a new vector of length equal to the vector `prefs$alt$c`

that will pick out the elements of \(\mathbf{c}\) of interest, average, and then contrast the elements against each other – more detail about the intrepretation of linear contrasts in this context is found in (Roualdes et al. 2016). For example, if we wanted to compare prey species \(1\) and \(2\) across the first two time periods, we would set up the contrast

`b <- c(1/2, 1/2, 0, 0, 0, -1/2, -1/2, rep(0, 8))`

Then use the funciton `testC`

to answer the question, is prey species \(1\) within the first two time periods equally preferred to prey species \(2\) in those same time periods?

`testC(prefs, b, sig.level=0.8)`

```
##
## Linear Contrast: 0.5 0.5 0 0 0 -0.5 -0.5 0 0 0 0 0 0 0 0
##
## data: Spiders and Traps
## b^t * C = -0.11591, p-value = 0.2654
## alternative hypothesis: true linear contrast is not equal to 0
## 95 percent confidence interval:
## -0.31987621 0.08806175
## sample estimates:
## mean of c_1 mean of c_2 mean of c_3 mean of c_4 mean of c_5
## 0.4153849 0.4853809 0.4181821 0.5078875 0.4021354
## mean of c_6 mean of c_7 mean of c_8 mean of c_9 mean of c_10
## 0.5909119 0.5416683 0.5338998 0.4659097 0.5168153
## mean of c_11 mean of c_12 mean of c_13 mean of c_14 mean of c_15
## 0.4722228 0.4838717 0.4537820 0.4488453 0.4945064
```

Since the \(95\)% confidence interval contains \(0\), we conclude that prey species \(1\) and \(2\) are equally preferred within the first two time periods. N.B. We specified the significance level with `sig.level=0.8`

to force `testC`

to use the model specified in the alternative hypothesis; this was purely for demonstration of linear contrasts.

Dempster, Arthur P, Nan M Laird, and Donald B Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” *Journal of the Royal Statistical Society. Series B (Methodological)* 39 (1). JSTOR: 1–38.

Roualdes, Edward A, Simon J Bonner, Thomas D Whitney, and James D Harwood. 2016. “Formal Modelling of Predator Preferences Using Molecular Gut-Content Analysis.” *Environmental and Ecological Statistics*.

Strauss, Richard E. 1979. “Reliability Estimates for Ivlev’s Electivity Index, the Forage Ratio, and a Proposed Linear Index of Food Selection.” *Transactions of the American Fisheries Society* 108 (4). Taylor & Francis: 344–52.

Wilks, Samuel S. 1938. “The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses.” *The Annals of Mathematical Statistics* 9 (1). JSTOR: 60–62.