Aligning Relative Sequences

library(eratosthenes)

Introduction

This vignette outlines functions that check and align multiple relative sequences of events using eratosthenes. Not all sequences are however of the same informational value. Some sequences may be based on ideal or optimal theoretical assumptions, as with frequency or contextual seriation, and others they may be based on physical relationships, as with soil stratigraphy. It is clear that such theoretical sequences should yield to those based on physical relationships, but this raises the problem of how to coerce one or more theoretical sequences which may contain events that also have physical relationships. For example, one can produce an ideal seriation of contexts that includes discrete, single deposits on the one hand (say a number of separate pits with no overlap) and on the other stratified deposits. In such an optimal seriation, it is possible for those stratified deposits to be seriated “out of order” with respect to their stratified sequence (i.e., the stratigraphy may have been perturbed at one more moments so that their finds assemblages are not well stratified). Accordingly, it is desirable to take the optimal order achieved via seriation and constrain it back to agree with the soil stratigraphy.

Checking Sequence Agreement

To check whether two or more sequences of events agree with one another in their order, the seq_check() function is used. Sequences should run in the same direction from left to right (i.e., the earliest element of all sequences should be the first element), and be contained in a list object. The seq_check() function returns a logical output if all sequences have the same elements in the same order.

Both sequences x and y contain the same events in the same order:

x <- c("A", "B", "C", "D", "E")
y <- c("B", "D", "F", "E")
a <- list(x, y)
seq_check(a)
#> [1] TRUE

But sequence z contains events "F" and "C" out of order with respect to x and y:

z <- c("B", "F", "C")
b <- list(x, y, z)
seq_check(b)
#> [1] FALSE

Merging Sequences

The synth_rank() function will use recursion in order to produce a single, “merged” or “synthesized” sequence from two or more sequences. This is accomplished by counting the total number of elements after running a recursive trace through all partial sequences (via the quae_postea() function, on which see below). If partial sequences are inconsistent in their rankings, a NULL value is returned.

x <- c("A", "B", "C", "D", "H", "E")
y <- c("B", "D", "F", "G", "E")
a <- list(x, y)
synth_rank(a)
#> [1] "A" "B" "C" "D" "F" "H" "G" "E"

Producing a single merged or synthesized sequence is a matter of procedural convenience for the gibbs_ad() function (see the vignette on Gibbs Sampling for Archaeological Dates). To be sure, events missing from one sequence or another could occur at different points in the merged sequence. In the example above, "H" could occur at any point after "D" and before "E", but the synth_rank() function has situated it in between "F" and "G".

If sequences disagree, a NULL value is returned for the synth_rank() function.

Adjusting Sequences

As mentioned above, one may have sequences which are derived via theoretical considerations (e.g., frequency or contextual seriation), and some which are known (e.g., soil stratigraphy or historical documentation). The seq_adj() function will take an “input” sequence and adjust its ordering to fit with another “target” sequence of smaller size.

For example, the input sequence might be an ordering obtained from a contextual seriation which is based on artifact types found in both tomb assemblages as well as stratified deposits, from one or more sites. And the target sequence might be a known stratigraphic sequence. One wants to maintain the ordering of the target sequence, adjusting the input into agreement:

# input
seriated <- c("S1", "T1", "T2", "S3", "S4", "T4", "T5", "T6", "S5", "T7", "S2")
# target
stratigraphic <- c("S1", "S2", "S3", "S4", "S5")
# input adjusted to agree with the target
seq_adj(seriated, stratigraphic)
#>  [1] "S1" "T1" "S2" "T2" "S3" "T7" "S4" "T4" "T5" "T6" "S5"

To achieve this the seq_adj() function performs a linear interpolation between jointly attested events, placing the input sequence along the \(x\) axis and the target sequence along the \(y\) axis, coercing the order of all elements to the \(y\) axis:

For multiple stratigraphic sequences, the seq_adj() function can be re-run, taking a new target sequence and using the previous result as the new input sequence. If both input and target sequences agree, the input sequence will be returned.

All Earlier Events, All Later Events

A core need of the gibbs_ad() function is to determine, for each event, all events which come later and earlier than that event. The functions quae_postea() and quae_antea() achieve this need for later events and earlier events respectively. Hence, there is no need to determine only one, single sequence or ordering as an input. Instead, a list object which contains multiple (partial or incomplete) sequences. The output is a list indexed with each element, containing the vector of contexts which precede or follow that element.

For quae_antea(), a dummy element of "alpha" is included in all vectors, and for quae_postea(), a dummy element of "omega" is included. The elements of "alpha" and "omega" are necesary as they constitute the fixed lower and upper limits in which estimates are made in gibbs_ad().

x <- c("A", "B", "C", "D", "H", "E")
y <- c("B", "D", "F", "G", "E")
a <- list(x, y)
quae_postea(a)
#> $A
#> [1] "omega" "B"     "C"     "D"     "H"     "E"     "F"     "G"    
#> 
#> $B
#> [1] "omega" "C"     "D"     "H"     "E"     "F"     "G"    
#> 
#> $C
#> [1] "omega" "D"     "H"     "E"     "F"     "G"    
#> 
#> $D
#> [1] "omega" "H"     "E"     "F"     "G"    
#> 
#> $H
#> [1] "omega" "E"    
#> 
#> $E
#> [1] "omega"
#> 
#> $F
#> [1] "omega" "G"     "E"    
#> 
#> $G
#> [1] "omega" "E"
quae_antea(a)
#> $A
#> [1] "alpha"
#> 
#> $B
#> [1] "alpha" "A"    
#> 
#> $C
#> [1] "alpha" "A"     "B"    
#> 
#> $D
#> [1] "alpha" "A"     "B"     "C"    
#> 
#> $H
#> [1] "alpha" "A"     "B"     "C"     "D"    
#> 
#> $E
#> [1] "alpha" "A"     "B"     "C"     "D"     "H"     "F"     "G"    
#> 
#> $F
#> [1] "alpha" "B"     "D"     "A"     "C"    
#> 
#> $G
#> [1] "alpha" "B"     "D"     "F"     "A"     "C"

The function seq_check(), which checks to see if sequences contain any discrepant orderings, relies on quae_postea().