---
title: "2. A complete analysis case: collaborative-regulation sequences"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{2. A complete analysis case: collaborative-regulation sequences}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>",
                      fig.width = 7, fig.height = 4.5)
library(transitiontrees)
set.seed(1)
```

This vignette runs one dataset all the way through and **reads the numbers
at each step** -- not just *what* to call but *what the output means* and
*what not to over-read*. The data are the bundled `group_regulation_long`
event log: students' collaborative regulation-of-learning actions
(`plan`, `monitor`, `consensus`, `discuss`, ...), one row per action, with
a `High` / `Low` achievement label per student.

The arc that emerges, stated up front so the sections connect: regulation
talk has **short memory** -- the immediately preceding action carries most
of the predictive signal -- a handful of two-action routines reproducibly
add to it, and **high and low achievers regulate differently**, which the
permutation test confirms.

## 1. The data

```{r data}
data(group_regulation_long)
nrow(group_regulation_long)
head(group_regulation_long)
sort(table(group_regulation_long$Action), decreasing = TRUE)
```

The nine actions are very unevenly used: `consensus` and `plan` dominate,
`adapt` and `synthesis` are rare. That imbalance is the most important fact
about the corpus and it echoes through every result -- a model that just
guesses `consensus` will look deceptively good, so the interesting question
is never "what is the modal next action" but "*which histories overturn
that default*".

## 2. Fit

`context_tree()` reads the long log directly: name the unit (`actor`), the
clock (`time`), and the state (`action`); it reshapes into one sequence per
session and fits. Sessions are split where the time gap is large.

```{r fit}
tree <- context_tree(group_regulation_long,
                     actor = "Actor", time = "Time", action = "Action",
                     max_depth = 3L, min_count = 10L)
tree
```

The banner reports the depth, the node count, the alphabet, and the
sequence/observation totals. The root line is the **null model**: the next
action given *no* history. Every deeper context has to beat that to earn its
place.

## 3. Inspect

```{r inspect}
summary(tree)
model_fit(tree)
```

Perplexity is the readable scalar: the effective number of equally likely
next actions. The uniform baseline is `r length(tree$alphabet)` (nine
actions, no knowledge); the fitted tree's
`r round(model_fit(tree)$perplexity, 2)` says recent history collapses nine
possibilities to about
`r round(model_fit(tree)$perplexity, 1)`. Real structure -- but this is
*in-sample* and the tree is over-grown, so read it as an optimistic bound.
Sections 6 and 7 give the honest figure.

## 4. The pathway tables

Three named verbs each fix a useful sort over the one canonical schema.

```{r common}
common_pathways(tree, top = 8)      # the highways
```

```{r divergent}
divergent_pathways(tree, top = 8)   # where adding history changes the prediction most
```

```{r sharp}
sharp_pathways(tree, top = 8)       # the most peaked next-action predictions
```

Read the divergent table in two layers. The very top rows can have large
`divergence` on a small `count` -- a short history seen just over the
`min_count` floor that happened to resolve one way. Those are small-sample
mirages; the bootstrap in section 7 exists to disarm them. The rows that
*also* carry a large `count` are the well-supported redirections worth
quoting.

The sharp table teaches the same caution from the probability side: a
`next_probability` near 1 on a low `count` is a near-empty cell after
smoothing, not a law of behaviour. Sharpness **with** support is a rule;
sharpness without it is noise.

## 5. Per-context diagnostics

`tree_dependence()` is the information-theoretic decomposition the KL
pruning rule thresholds: per context, how many **bits** of next-action
uncertainty the extra history removes (`entropy_drop`), and whether it flips
the modal prediction.

```{r dependence}
tree_dependence(tree, sort_by = "entropy_drop", top = 8)
```

A large `entropy_drop` with `changes_prediction = TRUE` is the most valuable
kind of context: it both sharpens *and* redirects. Watch for **negative**
`entropy_drop` -- the longer history left the next action *more* uncertain
than its parent; that is the textbook signature of a context pruning should
remove.

## 6. Prune to the reliable tree

```{r prune}
pruned <- prune_tree(tree, criterion = "G2", alpha = 0.05)
pruned
```

The pruned banner reports the surviving node count and criterion; compare it
to the unpruned `tree` from section 2. Each removed context failed a
likelihood-ratio G-squared test against its
one-shorter parent: the extra history did not explain enough added variation
in the next action to justify keeping it. That the tree collapses so far is
itself a finding -- most of the grown depth was unsupported, and the durable
structure lives near the root.

## 7. Held-out predictive quality

The honest, out-of-sample estimate comes from cross-validation, which
`tune_tree()` runs at the sequence level over a `(max_depth, min_count, ...)`
grid -- no hand-made train/test split. The in-sample perplexity is the
optimistic bound; the cross-validated winner is the figure to report.

```{r holdout}
model_fit(pruned)$perplexity                       # in-sample (optimistic)

tg <- tune_tree(group_regulation_long,
               actor = "Actor", time = "Time", action = "Action",
               max_depth = 1L:3L, min_count = 10L, folds = 5L, seed = 1L)
attr(tg, "best")                                   # cross-validated winner
```

A cross-validated perplexity close to the in-sample value is the signature of
a well-pruned model that generalises; a large gap would say *prune harder*.

`mine_sequences()` then surfaces the sessions the fitted model predicts worst
-- the atypical regulation trajectories worth a closer look -- and
`score_positions()` the individual moves it is most blindsided by:

```{r scores}
wide <- prepare_input(group_regulation_long,
                     actor = "Actor", time = "Time", action = "Action")
mine_sequences(pruned, newdata = wide, which = "surprising", n = 5L)
score_positions(pruned, newdata = wide, worst = 5L)
```

## 8. Bootstrap reliability

`prune_tree()` asked "which contexts pass a criterion *in this dataset*?".
The bootstrap asks the stricter question -- "which pass *reproducibly* under
resampling?" -- and reports two flags. **`stable`**: the count reproduces.
**`informative`**: the G-squared against the parent reproducibly clears the
chi-square bar. A claim worth making is **both**.

```{r bootstrap}
boot <- bootstrap_pathways(pruned, iter = 100L, stat = "count", seed = 1L)
boot
```

```{r bootstrap-summary}
head(summary(boot), 10)
```

`summary()` sorts the trustworthy (stable *and* informative) pathways first,
so the top rows are the defensible set. The two flags screen different failure
modes. `stable` alone keeps
high-count noise pathways; `informative` alone could surface a low-count
borderline pathway whose sample G-squared is high by chance. Their
conjunction is the defensible set.

```{r bootstrap-plot, fig.height = 5.5}
plot(boot)
```

In the forest plot each bar is a 95% bootstrap interval on G-squared; the
dashed line is the chi-square critical value. A bar entirely to the right is
reproducibly informative; a bar straddling the line is not safe to claim.

## 9. Do high and low achievers regulate differently?

Fit **one tree per group** in a single call with `group =`, then test where
the groups diverge with a permutation null. The grouping variable is an
*external* student attribute (`Achiever`), not derived from the actions
themselves -- otherwise the comparison would be circular.

```{r groups}
grp <- context_tree(group_regulation_long,
                    actor = "Actor", time = "Time", action = "Action",
                    group = "Achiever", max_depth = 2L, min_count = 10L)
cmp <- compare_groups(grp, iter = 199L, seed = 1L)
cmp$omnibus
```

The omnibus table reports two axes. **behavioral** is the count-weighted
Jensen-Shannon divergence (bits) between the groups' next-action
distributions, summed over shared contexts -- "given the same history, do
the groups do different things next?". **usage** is the summed G-squared
homogeneity statistic -- "do they reach the same contexts at the same
rates?". Each `p_value` comes from permuting the group labels.

```{r groups-plot, fig.height = 5}
plot_difference(grp, depth = 1L)
```

The per-context residual map shows *where* the groups differ: red and blue
cells are the contexts a high achiever and a low achiever resolve toward
different next actions. `depth = 1L` restricts it to the single-action
contexts so the rows stay readable; drop it (or raise it) to inspect deeper
histories.

## Synthesis

Pulling the thread through every section:

1. **The action alphabet is imbalanced** -- frequency is a misleading lens
   and modal predictions are trivially `consensus`/`plan`.
2. **Memory is short** -- pruning collapses the tree to a small set of
   contexts, and held-out perplexity confirms the shallow model generalises.
3. **The insight is in the divergent, well-counted contexts** -- not the
   common ones, and not the spectacular low-count tail.
4. **Only the stable-and-informative pathways are claimable** -- the
   bootstrap is the trust filter between an eyeballed table and a finding.
5. **High and low achievers regulate measurably differently** -- the
   permutation test licenses the claim that the omnibus statistic is real,
   not a relabelling artefact.

Each claim is anchored to a function whose output you can re-run -- the whole
point of a pathway-centric, testable model.