---
title: "Data input"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data input}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, echo = F, message=FALSE}
library(baytaAAR)
```

As input, `bay.ta()` expects a `matrix` of trait expressions. In its simplest form, this may contain only one column with a single trait as in the following example from the Neolithic gallery grave at Sorsum/Germany ("auricular surface"):

```{r sorsum data, echo = T}
data(sorsum_as, package = "baytaAAR")
head(sorsum_as)
```

In this case, the data are stored as a `data.frame`. If this has not been done beforehand, the input is internally converted by `bay.ta()` to a `matrix`. This is necessary even if -- as here -- there is only one trait.

Another example is provided with the data from Spitalfields which, among others variables, contains five columns with traits.

```{r spitalfields matrix, echo = T}
data(spitalfields, package = "baytaAAR")
head(spitalfields[,2:6])
```

NAs are allowed but neither rows nor columns may consist entirely of `NA` values. `bay.ta` will refuse to run in such cases, and the offending rows or columns need to be removed before analysis. Please see the `vignette("articles/worked_example")` on Chelsea 'Old church' for an example of how this can be done.

The levels of all traits must start at `1`. Binary traits are possible. Mixed levels like `1.5` as a short-cut for a trait expression between `1` and `2`, however, should not be used as this would violate basic principles of ordinal scaling. Thus, in such cases one of the neighboring levels must be chosen or they need to be set to `NA`.

In contrast to what has been published before [@ref_299013], the nodes (i.e., rows of the matrix) do not need to be fully observed for the multinormal model to run, partly because, with [NIMBLE vers. 1.4.1](https://r-nimble.org/release-notes.html#february-14-2026-weve-released-version-1.4.1), the NIMBLE team introduced a sampler for partially observed multivariate normal variables.
