---
title: "Spitalfields: Comparison with known age-at-death"
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Spitalfields: Comparison with known age-at-death}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, message = F}
library(baytaAAR)
```

To show how the age estimations of `baytaAAR` can be compared with known age-at-death, we use the data set of Spitalfields [@ref_263525]. The individuals come from a crypt at Christ Church in Spitalfields, London and were identified in terms of age and sex by plates on their coffins. The following table gives an impression of the data:

```{r spitalfields data, echo = T}
data(spitalfields, package = "baytaAAR")
head(spitalfields)
```

Buckberry and Chamberlain introduced a new scoring system of the *pubic symphysis* and thereby replaced a single score with multiple levels to several traits with fewer levels each.

For running the model, we use NIMBLE and the multivariate normal likelihood (`mnorm`). In a real-life setting, `multicore` should be set to `TRUE` to reduce computation time. Still, even on modern hardware, the model will run for up to several weeks until the usual quality criteria are met. However, already a shorter run demonstrates the principle [for the full run see the results in @ref_299013].

```{r spitalfields bayta, echo = TRUE, eval=FALSE }
spitalfields_res <- bay.ta(
  framework = "NIMBLE",
  algorithm = "mnorm",
  multicore = F,
  method = spitalfields[,c(2:6)],
  minimum_age = 16,
  parameters = c("b", "a", "beta0", "beta", "thresh", "age.s", "Ustar"),
  thinSteps = 200,
  numSavedSteps = 500,
  seed = 331
)
```

For the sake of illustration, we skip the diagnostic checks here and move directly to the comparison with known ages-at-death. For this, `baytaAAR` provides several custom functions which essentially work as wrappers around other R functions for convenience.

As the name suggests, the function `age.comp.summary()` summarizes standard quality measures of the comparison of known and estimated ages-at-death. These fall into two groups:

1. Frequentist point estimates
- `Bias` - Arithmetic mean of the difference between known and estimated age of all individuals in the sample.
- `corrPearson` -- Correlation between known and estimated age with
- `p_value` -- p-value of that correlation and
- `Residual_slope` -- Slope of the regression of the difference between known and estimated age.
- `Inaccuracy` -- Arithmetic mean of the absolute difference between known and estimated age.
- `RMSE` -- *Root mean square error* of known and estimated age.

2. Bayesian density estimates
- `TMNLP` -- *Test mean log posterior*, a local evaluation of the probability density at the point of known age [@stull_et_al_2022].
- `CRPS` -- *Continuous ranked probability score*, a global evaluation of the probability density at the point of known age [@gneiting_et_al_2007].

For nearly all measures, lower is better. The only exception is `corrPearson` where larger is better. Because the differences are squared with the `RMSE`, outliers influence this measure more strongly than this is the case for `Inaccuracy`, and therefore, the former tends to be higher than the latter. A similar relationship exists between `CRPS` and `TMNLP`. Overall, Müller-Scheeßel et al. found that the `CRPS` seems to be more informative than the `TMNLP` and, therefore, the `CRPS` is probably the most informative and single best measure to compare the results between different age estimation methods in terms of their quality to 'predict' age-at-death.

`age.comp.summary()` assumes that you choose one of the point estimates `Mode`, `Median`, `Mean` but in the code snippet below we demonstrate how all three measures can be computed and tabulated in one go. `age.comp.summary()` expects the raw output from `bay.ta()` as well as a vector of known age-at-death. The latter may contain `NAs`, these are subsequently simply ignored.


```{r load spitalfields_res, echo = FALSE}
spitalfields_res <- baytaAAR:::spitalfields_res
```

```{r spitalfields age.estimate.summary, echo = TRUE}
summary_list <- lapply(c("Mode", "Median", "Mean"), function(choice) {
  age.comp.summary(mcmc_list = spitalfields_res, 
                   known_age = spitalfields$Age,
                   mean_choice = choice)})
summary_mat <- do.call(rbind, summary_list)
rownames(summary_mat) <- c("Mode", "Median", "Mean")
summary_mat |> t() |> knitr::kable(digits = 2)
```

As the above table shows, it can make quite a difference which mean measures `Mean` (= arithmetic mean), `Median` or `Mode` is chosen for the frequentist point estimate. In this case, the differences are even exaggerated because there are not enough iterations for stable estimates. On the other hand, the Bayesian estimates are not influenced by the choice of the mean measure because they are based on the probability density of the posterior.

Age estimation methods usually also supply minimum and maximum values for their estimates, so a range within which the true age-at-death should fall. Ideally, this range should adapt to the credibility/confidence level chosen, and the percentage of the true cases, the so-called *coverage* should be close to the chosen level. To test *coverage* formally, in the literature the *cumulative binomial test* has been proposed [@konigsberg_et_al_2008]. The wrapper `sequential.binom.test()` calculates this test for several given credibility levels:

```{r spitalfields binom test}
sequential.binom.test(spitalfields_res,
                      HDImass = c(seq(0.5, 0.9, 0.1), 0.95),
                      known_age = spitalfields$Age) |>
  knitr::kable(digits = 3)
```

In the table above, for all levels the confidence intervals of the observed *coverage* include the expected *coverage*. Thus, this result suggests that the model is able to 'predict' age-at-death with the correct level of confidence.

Finally, `age.comp.plot()` shows the relationship between estimated and known ages-at-death graphically. Its four plots convey the following information:

- **Top left**: Comparison of estimated *highest density intervals* with known ages (green = age within HDI, red = age outside HDI, individuals ordered according to known age-at-death).

- **Top right**: Comparison of the density of known ages with a Gompertz function derived from the arithmetic mean of the estimated population parameters $\alpha$ and $\beta$.

- **Bottom left**: Scatter plot of known and estimated ages with regression line in blue. The dotted line marks perfect equivalence.

- **Bottom right**: Slope of the regression line from the left bottom image (cf. goodness-of-fit measure `Residual_slope` from the function `age.comp.summary()`).

Again, it can make quite a difference which mean measure is chosen for the point estimate for the two lower plots. The left top plot is not affected because the *highest density intervals* derive from the density distribution of the age estimates. For the right top plot, the Gompertz parameters are always taken from the arithmetic mean because only then the deterministic relationship between $\alpha$ and $\beta$ holds. As the code below shows, `age.comp.plot()` expects the output from `diagnostic.summary()` so the credibility level chosen (`HDImass`) there also determines what is shown in the top left plot. `age.comp.plot()` presupposes, of course, that there is a vector with known ages.

```{r spitalfields plot, fig.width=8, fig.height=6, fig.align = 'center', warning=FALSE}
diagnostic.summary(spitalfields_res, HDImass = 0.95) |>
  age.comp.plot(known_age = spitalfields$Age)
```

The top left plot visualizes the result of the bottom line of the last table: Only eleven estimates, marked in red, do not contain the true age-at-death at the default credibility level 0.95. The top right plot illustrates the good agreement of the estimated Gompertz parameters fed into a Gompertz function with the density of the true ages-at-death. The bottom left plot is related to the correlation expressed in `corrPearson`, of course. The bottom-right plot shows the same relationship, relative to the dotted line in the bottom-left plot which is equivalent to the zero-line on the bottom right. The regression line in blue on the bottom right is nothing else than the quality measure `Residual_slope`. This plot also visualizes the spread of the difference between true and estimated age-at-death.

---

## References
