---
title: "**kronxNBC**: Student-t Naive Bayes Classification for Financial Market Regime Detection in R"
author:
  - name: Oscar Linares
    email: olinares@umich.edu
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
bibliography: references.bib
vignette: >
  %\VignetteIndexEntry{kronxNBC: Student-t Naive Bayes for Financial Regime Detection}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 4.5,
  fig.align = "center"
)
set.seed(42L)
```

---

## Abstract {-}

Financial markets cycle through qualitatively distinct operating environments---calm,
trending, and crisis---that cannot be described by a single distribution. Standard
Gaussian Naive Bayes (NBC) classifiers are analytically convenient but numerically
fragile when crisis-period returns fall deep in the tail: log-densities approach
$-\infty$ and the classifier suffers catastrophic misclassification of the very
observations it most needs to identify correctly. **kronxNBC** addresses this by
replacing the Gaussian class-conditional likelihood with a scaled Student-$t$
distribution whose degrees-of-freedom parameter $\nu$ is selected independently
for each (class, feature) pair via a profile log-likelihood grid search over a
discrete candidate set. This targeted search eliminates the numerical instability
of continuous $\nu$ optimisation while automatically capturing the heavy-tailed
character of crisis returns. The package exports the constructor
`student_t_naive_bayes()` together with the complete S3 interface: `predict()`,
`print()`, `summary()`, `plot()`, `coef()`, and `tables()`. All examples in this
paper are self-contained and reproducible without external data.

**Keywords:** Naive Bayes, Student-t distribution, heavy tails, financial regime
detection, robust classification, R.

---

# Introduction

Classifying the current operating regime of a financial market is a foundational
task in risk management, portfolio construction, and algorithmic trading. During
tranquil, *Calm* periods returns are near-Gaussian, volatility is low, and
standard portfolio theory applies. During *Steady* (trend) periods there is a
persistent directional drift, and drawdowns are moderate and recoverable. During
*Stress* periods, by contrast, returns exhibit strongly fat-tailed behaviour,
drawdowns are severe, and the distributional assumption underlying most risk models
breaks down entirely.

The problem is not merely that volatility rises in a Stress regime. The shape of
the return distribution changes: the tails become genuinely heavier in a way that
cannot be captured by scaling a Gaussian. @BlattbergGonedes1974 showed that daily
stock returns are better described by a Student-$t$ distribution than by a normal,
and this observation has been replicated across asset classes and frequencies.
Kurtosis of daily equity returns routinely exceeds 6; for hourly futures returns
during a crisis it can exceed 20.

A Naive Bayes classifier trained on historical regime data uses the class-conditional
likelihood $f(x_j \mid \text{class } k)$ to assign new observations to regimes.
Under Gaussian assumptions, a single observation that lies $5\sigma$ from the
class mean receives a log-density near $-13$; under a Student-$t$ with $\nu = 3$
degrees of freedom, the same point receives a log-density near $-5$. When
log-densities are summed across multiple features the Gaussian classifier rapidly
accumulates $-\infty$ contributions that dominate the posterior and force every
crisis observation into whichever non-Stress class has the least-negative
log-density---a failure mode we call *classification collapse*.

**kronxNBC** resolves this by:

1. Modelling each class-conditional distribution as a scaled Student-$t$ rather
   than a Gaussian.
2. Selecting $\nu$ for every (class, feature) pair via a profile log-likelihood
   grid search, preventing the numerical instability of continuous optimisation.
3. Applying a log-density floor (`threshold`) that prevents any single feature
   from absorbing the entire posterior.

The package is designed to integrate naturally with the **naivebayes** ecosystem
[@naivebayes]: the `tables()` generic is extended so that `student_t_naive_bayes`
objects print parameter tables in the same format, and the `tables.default` method
forwards all other Naive Bayes classes to `naivebayes::tables()`.

The remainder of this paper is organised as follows. Section 2 develops the
statistical methodology. Section 3 describes the package API. Section 4 provides
a complete worked example on synthetic data. Section 5 compares **kronxNBC** with
a standard Gaussian NBC. Section 6 covers advanced usage. Section 7 outlines the
Clock of Regimes (COR) empirical framework. Section 8 concludes.

---

# Statistical Methodology

## The Naive Bayes Framework

Let $\mathbf{x} = (x_1, \ldots, x_p)^{\top}$ be a $p$-dimensional feature vector
and $y \in \{1, \ldots, K\}$ a discrete class label. Under the Naive Bayes
conditional-independence assumption, the posterior probability of class $k$ is

$$
P(y = k \mid \mathbf{x}) \;=\;
\frac{\pi_k \prod_{j=1}^{p} f_{kj}(x_j)}
     {\sum_{\ell=1}^{K} \pi_\ell \prod_{j=1}^{p} f_{\ell j}(x_j)},
$$

where $\pi_k = P(y = k)$ is the class prior and $f_{kj}$ is the class-conditional
density for feature $j$ given class $k$. The Maximum A Posteriori (MAP) rule
assigns the new observation to

$$
\hat{k} \;=\; \arg\max_{k \in \{1,\ldots,K\}} \left[
  \log \pi_k + \sum_{j=1}^{p} \log f_{kj}(x_j)
\right].
$$

Working in log-space is both numerically convenient and necessary: products of
small probabilities underflow to zero for any floating-point arithmetic.

## The Scaled Student-$t$ Distribution

**kronxNBC** models $f_{kj}$ as a scaled (location-scale) Student-$t$ density
with three parameters: location $\mu_{kj}$, scale $\sigma_{kj} > 0$, and
degrees of freedom $\nu_{kj} > 2$. The density is

$$
f_t(x \mid \mu, \sigma, \nu) \;=\;
\frac{1}{\sigma}\, t_\nu\!\left(\frac{x - \mu}{\sigma}\right),
$$

where $t_\nu(\cdot)$ denotes the standard (zero location, unit scale) Student-$t$
density with $\nu$ degrees of freedom:

$$
t_\nu(z) \;=\;
\frac{\Gamma\!\left(\tfrac{\nu+1}{2}\right)}
     {\sqrt{\nu\pi}\,\Gamma\!\left(\tfrac{\nu}{2}\right)}
\left(1 + \frac{z^2}{\nu}\right)^{-(\nu+1)/2}.
$$

The requirement $\nu > 2$ ensures finite variance. Key tail properties:

- $\nu \approx 3$--$5$: very heavy tails; the fourth moment may not exist.
  A $4\sigma$ return is orders of magnitude more probable than under a Gaussian.
- $\nu \approx 15$--$30$: tails are heavier than Gaussian but fourth moments exist.
- $\nu \geq 60$: practically indistinguishable from $\mathcal{N}(\mu, \sigma^2)$
  in the body of the distribution.

The Gaussian NBC corresponds to the limit $\nu \to \infty$. **kronxNBC** recovers
near-Gaussian behaviour when the data support it (high $\nu$ selected) while
automatically using heavier tails where needed (low $\nu$ selected for Stress).

## Profile Maximum Likelihood for $\nu$

Given observations $\{x_1, \ldots, x_n\}$ from one (class, feature) cell, the
joint log-likelihood is

$$
\ell(\mu, \sigma, \nu) \;=\;
\sum_{i=1}^{n} \left[
  \log t_\nu\!\left(\frac{x_i - \mu}{\sigma}\right) - \log \sigma
\right].
$$

Continuous optimisation over $\nu$ is numerically fragile because the
log-likelihood surface is nearly flat for $\nu > 30$ and can have shallow local
maxima for very small samples. **kronxNBC** instead evaluates a discrete candidate
grid $\mathcal{G} = \{3, 4, \ldots, 30, 40, 60, 100\}$ and, for each
$\nu \in \mathcal{G}$, computes the profile estimators $(\hat\mu(\nu), \hat\sigma(\nu))$
via one step of Iteratively Reweighted Least Squares (IRLS):

$$
u_i(\nu) \;=\; \frac{\nu + 1}{\nu + \hat{z}_i^2},
\qquad \hat{z}_i = \frac{x_i - \bar{x}}{\hat{s}},
$$

$$
\hat\mu(\nu) \;=\; \frac{\sum_i u_i(\nu)\, x_i}{\sum_i u_i(\nu)},
\qquad
\hat\sigma(\nu) \;=\; \sqrt{\frac{1}{n}\sum_i (x_i - \hat\mu(\nu))^2} + \varepsilon,
$$

where $\varepsilon = 10^{-8}$ is a floor that prevents zero-scale degeneracy on
constant features. The IRLS weights $u_i(\nu)$ down-weight large residuals,
making the estimators robust to the outliers that define a Stress regime.

The selected degrees of freedom are

$$
\hat\nu \;=\; \arg\max_{\nu \in \mathcal{G}}\;
\ell\!\left(\hat\mu(\nu), \hat\sigma(\nu), \nu\right),
$$

and the stored parameters for this (class, feature) cell are
$(\hat\mu(\hat\nu), \hat\sigma(\hat\nu), \hat\nu)$.

## Numerical Stability During Scoring

Even with heavy tails, extreme observations may produce log-densities that are
numerically very negative. `predict.student_t_naive_bayes()` applies a log-density
floor: any feature log-density $\log f_{kj}(x_j)$ that falls below
$\log(\texttt{threshold})$ is replaced by $\log(\texttt{threshold})$. The default
$\texttt{threshold} = 0.001$ prevents a single pathological feature from
dominating the posterior. The softmax normalisation of the log-posteriors uses the
standard log-sum-exp identity for numerical stability.

---

# Package Overview

## Installation

```{r install, eval = FALSE}
install.packages("kronxNBC")
```

The development version can be installed from source. The package depends on
**stats**, **graphics**, and **utils** (all part of base R) and on **naivebayes**
[@naivebayes].

## Function Reference

**kronxNBC** exports two functions and six S3 methods.

| Symbol | Role |
|--------|------|
| `student_t_naive_bayes(x, y, prior, nu_grid)` | Constructor: fit the model |
| `tables(object, which)` | S3 generic (extended from **naivebayes**) |
| `predict(object, newdata, type, threshold, eps)` | Class labels or posterior probabilities |
| `print(x)` | Concise console display |
| `summary(object)` | High-level model summary |
| `plot(x, which, prob, arg.num, legend)` | Per-feature density curves |
| `coef(object)` | Parameter data frame ($p \times 3K$) |

### `student_t_naive_bayes()`

```
student_t_naive_bayes(x, y, prior = NULL, nu_grid = c(3:30, 40, 60, 100), ...)
```

**Arguments**

- `x`: A numeric matrix with named columns. Each column is one predictor; each
  row is one observation. Data frames are coerced via `as.matrix()`.
- `y`: A factor, character, or logical class-label vector of the same length as
  `nrow(x)`. Must contain at least two distinct levels and at least two
  observations per level. `NA` labels are silently dropped.
- `prior`: Optional named numeric vector of prior class probabilities.
  Normalised to sum to one. Defaults to empirical class frequencies.
- `nu_grid`: Numeric vector of candidate $\nu$ values. All entries must be
  strictly greater than 2. The default grid `c(3:30, 40, 60, 100)` spans
  extreme kurtosis ($\nu = 3$) through near-Gaussian behaviour ($\nu = 100$).

**Return value**

An S3 object of class `"student_t_naive_bayes"` with components:

| Component | Content |
|-----------|---------|
| `$data` | List with `$x` (training matrix) and `$y` (training labels) |
| `$levels` | Character vector of class levels |
| `$params` | Named list of $K \times p$ matrices `$mu`, `$sd`, `$nu` |
| `$prior` | Named numeric vector of prior probabilities |
| `$nu_grid` | The `nu_grid` used during fitting |
| `$call` | The matched call |

### `predict()`

```
predict(object, newdata = NULL, type = c("class", "prob"),
        threshold = 0.001, eps = 0, ...)
```

- `type = "class"` (default): returns a factor of MAP class assignments.
- `type = "prob"`: returns an $n \times K$ matrix of softmax-normalised
  posterior probabilities. Row $i$ sums to 1.
- `threshold`: log-density floor (replaces values $\leq \log(\texttt{eps})$).
- `eps`: densities at or below this value are replaced by `threshold`.
  Set to `0` (default) to use `.Machine$double.xmin`.

### `tables()`

Returns a `"naive_bayes_tables"` object (a named list, one element per feature).
Each element is a $3 \times K$ table with rows `mu`, `sd`, `nu` and one column
per class. The `which` argument accepts an integer index or character feature name.

### `coef()`

Returns a data frame with $p$ rows (features) and $3K$ columns named
`<class>:mu`, `<class>:sd`, `<class>:nu`. Suitable for programmatic access to
all fitted parameters.

### `plot()`

Draws the fitted class-conditional densities for one or more features.

- `prob = "conditional"` (default): the raw $f_t(x \mid \mu_{kj}, \sigma_{kj}, \nu_{kj})$.
- `prob = "marginal"`: density scaled by the class prior $\pi_k$.
- `which`: integer or character vector to select a subset of features.
- `arg.num`: named list of graphical parameters (`col`, `lty`) for the density lines.

---

# Complete Worked Example

This section demonstrates the full **kronxNBC** workflow on synthetic data that
mimics the distributional structure of hourly E-mini S&P 500 futures across three
financial regimes.

## Synthetic Dataset

We generate 300 observations (100 per regime) across six features that correspond
to the predictors used in the empirical Clock of Regimes (COR) application
described in Section 7. The Stress regime uses Student-$t(3)$ returns to reflect
genuinely fat-tailed crisis behaviour.

```{r synthetic-data}
library(kronxNBC)

set.seed(42L)
n <- 300L
mk <- n / 3L

make_regime <- function(ret_fn, vol_mu, vol_sd, dd_mu, dd_sd,
                        ts_mu, ts_sd, rp_a, rp_b, rp_lambda) {
  data.frame(
    log_return         = ret_fn(mk),
    rolling_volatility = abs(rnorm(mk, vol_mu,  vol_sd)),
    drawdown           = rnorm(mk, dd_mu,  dd_sd),
    transition_stress  = abs(rnorm(mk, ts_mu,  ts_sd)),
    residence_pressure = rpois(mk, rp_lambda),
    ruin_proxy         = rbeta(mk, rp_a, rp_b)
  )
}

X_syn <- rbind(
  # Calm: near-Gaussian, low volatility, shallow drawdowns
  make_regime(
    ret_fn    = function(n) rnorm(n, 0.0002, 0.003),
    vol_mu    = 0.004, vol_sd = 0.001,
    dd_mu     = -0.002, dd_sd = 0.002,
    ts_mu     = 0.001, ts_sd = 0.0005,
    rp_a = 1, rp_b = 20, rp_lambda = 1
  ),
  # Steady: positive drift, moderate drawdowns
  make_regime(
    ret_fn    = function(n) rnorm(n, 0.0005, 0.005),
    vol_mu    = 0.008, vol_sd = 0.002,
    dd_mu     = -0.008, dd_sd = 0.004,
    ts_mu     = 0.003, ts_sd = 0.001,
    rp_a = 2, rp_b = 10, rp_lambda = 3
  ),
  # Stress: fat-tailed (t_3), high volatility, deep drawdowns
  make_regime(
    ret_fn    = function(n) rt(n, df = 3L) * 0.012,
    vol_mu    = 0.022, vol_sd = 0.005,
    dd_mu     = -0.030, dd_sd = 0.010,
    ts_mu     = 0.015, ts_sd = 0.005,
    rp_a = 5, rp_b = 3, rp_lambda = 12
  )
)
X_syn <- as.matrix(X_syn)

y_syn <- factor(
  rep(c("Calm", "Steady", "Stress"), each = mk),
  levels = c("Calm", "Steady", "Stress")
)

# 80 / 20 random split (see Section 7 for the rationale)
set.seed(7L)
tr_idx  <- sample(n, size = floor(0.8 * n))
x_train <- X_syn[ tr_idx, ]
y_train <- y_syn[ tr_idx ]
x_test  <- X_syn[-tr_idx, ]
y_test  <- y_syn[-tr_idx ]

cat("Training set:", nrow(x_train), "observations\n")
cat("Test set:    ", nrow(x_test),  "observations\n")
cat("Class balance (train):\n")
print(table(y_train))
```

## Fitting the Model

```{r fit-model}
model <- student_t_naive_bayes(x_train, y_train)
summary(model)
```

`summary()` confirms the call, sample size, feature count, $\nu$-grid range, and
prior probabilities. Because we used a balanced design with equal class counts the
empirical priors are approximately $1/3$ each.

```{r print-model}
print(model)
```

`print()` additionally displays the fitted parameter tables for the first five
features (all six here). Each table is a $3 \times 3$ matrix: rows are $\mu$,
$\sigma$, and $\nu$; columns are class levels.

## Inspecting the Fitted Parameters

### Parameter Tables

```{r tables-all}
tabs <- tables(model)
print(tabs)
```

To access a single feature by name or index:

```{r tables-subset}
# By name
tables(model, which = "log_return")

# By index (second feature: rolling_volatility)
tables(model, which = 2L)
```

### Coefficient Data Frame

`coef()` returns all parameters in a tidy data frame suitable for export or
further analysis:

```{r coef}
cf <- coef(model)
cf
```

The column convention is `<class>:mu`, `<class>:sd`, `<class>:nu` for each of
the $K = 3$ classes, giving $3K = 9$ columns and $p = 6$ rows.

### Interpreting the $\nu$ Parameter

The estimated degrees-of-freedom matrix is the most theoretically important
output of the model.

```{r nu-matrix}
nu_mat <- model$params$nu
nu_mat
```

Under the generative model, the Stress regime was drawn from $t_3$: we therefore
expect the profile search to select low $\nu$ values for Stress features.

```{r nu-interpretation}
cat(sprintf(
  "log_return: nu(Calm) = %g  |  nu(Steady) = %g  |  nu(Stress) = %g\n",
  nu_mat["Calm",   "log_return"],
  nu_mat["Steady", "log_return"],
  nu_mat["Stress", "log_return"]
))

cat(sprintf(
  "drawdown:   nu(Calm) = %g  |  nu(Steady) = %g  |  nu(Stress) = %g\n",
  nu_mat["Calm",   "drawdown"],
  nu_mat["Steady", "drawdown"],
  nu_mat["Stress", "drawdown"]
))
```

Low $\nu$ for Stress on `log_return` and `drawdown` is the empirical signature of
a fat-tailed crisis distribution. This is not a modelling assumption imposed
a priori; it emerges from the profile grid search applied to the training data.

In real COR applications fitted on E-mini S&P 500 hourly data, the Stress regime
consistently receives $\hat\nu \in \{3, 4, 5\}$ on `log_return` and `drawdown`,
while the Calm regime receives $\hat\nu > 20$ on the same features.

## Density Visualisation

`plot()` draws the fitted class-conditional Student-$t$ densities over the
empirical range of each feature.

```{r plot-conditional, fig.cap = "Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return."}
plot(model,
     prob      = "conditional",
     arg.num   = list(col = c("steelblue", "forestgreen", "firebrick"),
                      lty = c(1L, 2L, 4L)),
     lwd       = 2)
```

`prob = "marginal"` scales each density by its class prior, giving a view of the
mixture components as they contribute to the overall marginal distribution:

```{r plot-marginal, fig.cap = "Marginal (prior-weighted) Student-t densities."}
plot(model,
     prob    = "marginal",
     arg.num = list(col = c("steelblue", "forestgreen", "firebrick"),
                    lty = c(1L, 2L, 4L)),
     lwd     = 2)
```

To plot a subset of features, use the `which` argument:

```{r plot-subset, fig.cap = "Conditional densities for log_return and drawdown only."}
plot(model,
     which   = c("log_return", "drawdown"),
     prob    = "conditional",
     arg.num = list(col = c("steelblue", "forestgreen", "firebrick")),
     lwd     = 2)
```

## Out-of-Sample Evaluation

### Class Predictions

```{r predict-class}
pred_class <- predict(model, newdata = x_test, type = "class")

accuracy <- mean(pred_class == y_test)
cat(sprintf("Out-of-sample accuracy: %.1f%%\n", 100 * accuracy))
```

### Confusion Matrix

```{r confusion}
cm <- table(Actual = y_test, Predicted = pred_class)
cm
```

The diagonal elements count correctly classified observations. Off-diagonal
elements are misclassifications; for risk management purposes, false-negatives
on the Stress class (Stress predicted as Calm) are the most costly.

### Posterior Probabilities

```{r predict-prob}
pred_prob <- predict(model, newdata = x_test, type = "prob")
cat("First 6 test observations — posterior probabilities:\n")
round(head(pred_prob), 4)
```

Each row is a proper probability distribution: entries are non-negative and sum
to one.

### COR Stress Alert Signal

In the Clock of Regimes framework, a **Stress Alert** is triggered whenever the
posterior probability $P(\text{Stress} \mid \mathbf{x}) > 0.60$, signalling that
a risk manager should review position sizing or hedge coverage.

```{r stress-alert}
stress_prob  <- pred_prob[, "Stress"]
alert_flag   <- ifelse(stress_prob > 0.60, "Stress Alert", "No Alert")

cat("Alert summary (test period):\n")
print(table(alert_flag))

cat("\nHighest Stress probabilities in test set:\n")
top_stress <- sort(stress_prob, decreasing = TRUE)
round(head(top_stress, 8L), 4)
```

---

# Comparison: Student-$t$ vs Gaussian Naive Bayes

A Gaussian NBC is a special case of the Student-$t$ NBC at $\nu \to \infty$. For
light-tailed data the two classifiers perform similarly; for fat-tailed Stress
data the Gaussian model suffers classification collapse. This section documents
the difference.

## Setup

We create a stress-concentrated test set: 40 observations drawn from $t_3$
(the true Stress distribution) alongside 20 each from the Calm and Steady
distributions.

```{r gnb-setup}
set.seed(99L)
n_stress <- 40L
n_calm   <- 20L

X_stress_test <- cbind(
  log_return         = rt(n_stress, df = 3L) * 0.012,
  rolling_volatility = abs(rnorm(n_stress, 0.022, 0.005)),
  drawdown           = rnorm(n_stress, -0.030, 0.010),
  transition_stress  = abs(rnorm(n_stress, 0.015, 0.005)),
  residence_pressure = as.numeric(rpois(n_stress, 12L)),
  ruin_proxy         = rbeta(n_stress, 5, 3)
)

X_calm_test <- cbind(
  log_return         = rnorm(n_calm, 0.0002, 0.003),
  rolling_volatility = abs(rnorm(n_calm, 0.004, 0.001)),
  drawdown           = rnorm(n_calm, -0.002, 0.002),
  transition_stress  = abs(rnorm(n_calm, 0.001, 0.0005)),
  residence_pressure = as.numeric(rpois(n_calm, 1L)),
  ruin_proxy         = rbeta(n_calm, 1, 20)
)

X_mixed_test <- rbind(X_stress_test, X_calm_test)
y_mixed_test <- factor(
  c(rep("Stress", n_stress), rep("Calm", n_calm)),
  levels = c("Calm", "Steady", "Stress")
)
```

## Gaussian NBC via naivebayes

The **naivebayes** package [@naivebayes] fits a Gaussian NBC when given a numeric
matrix. We use the same training data as before.

```{r gnb-fit}
library(naivebayes)

gnb <- naive_bayes(x_train, y_train, usekernel = FALSE)
```

## Head-to-Head Accuracy

```{r gnb-compare}
# Student-t NBC predictions
st_pred  <- predict(model, newdata = X_mixed_test, type = "class")
st_acc   <- mean(st_pred == y_mixed_test)

# Gaussian NBC predictions
gnb_pred <- predict(gnb, newdata = X_mixed_test, type = "class")
gnb_acc  <- mean(gnb_pred == y_mixed_test)

cat(sprintf("Student-t NBC overall accuracy: %.1f%%\n", 100 * st_acc))
cat(sprintf("Gaussian  NBC overall accuracy: %.1f%%\n", 100 * gnb_acc))
```

### Confusion Matrices

```{r gnb-confusion}
cat("Student-t NBC:\n")
table(Actual = y_mixed_test, Predicted = st_pred)

cat("\nGaussian NBC:\n")
table(Actual = y_mixed_test, Predicted = gnb_pred)
```

### Stress-Class Recall

For risk management the critical metric is *Stress recall*: the fraction of true
Stress observations correctly identified as such.

```{r gnb-recall}
st_stress_recall  <- mean(st_pred[y_mixed_test == "Stress"]  == "Stress")
gnb_stress_recall <- mean(gnb_pred[y_mixed_test == "Stress"] == "Stress")

cat(sprintf("Student-t NBC Stress recall: %.1f%%\n", 100 * st_stress_recall))
cat(sprintf("Gaussian  NBC Stress recall: %.1f%%\n", 100 * gnb_stress_recall))
```

## Tail Log-Density Comparison

To illustrate the mechanism, we compare the log-density each model assigns to a
single extreme observation: the 95th percentile of $|x|$ for `log_return` in the
Stress training data.

```{r tail-density}
x_extreme <- quantile(
  abs(x_train[y_train == "Stress", "log_return"]),
  probs = 0.95
)
cat("Extreme log_return value:", round(x_extreme, 5), "\n")

# Student-t log-density (Stress class parameters)
mu_st  <- model$params$mu["Stress", "log_return"]
sd_st  <- model$params$sd["Stress", "log_return"]
nu_st  <- model$params$nu["Stress", "log_return"]

ld_st <- stats::dt((x_extreme - mu_st) / sd_st, df = nu_st, log = TRUE) - log(sd_st)

# Gaussian log-density (same location/scale, normal tails)
ld_gn <- stats::dnorm(x_extreme, mean = mu_st, sd = sd_st, log = TRUE)

cat(sprintf("Student-t log-density at extreme point: %+.2f  (nu = %g)\n",
            ld_st, nu_st))
cat(sprintf("Gaussian  log-density at extreme point: %+.2f\n", ld_gn))
cat(sprintf("Difference (Student-t advantage):       %+.2f log-units\n",
            ld_st - ld_gn))
```

The positive difference shows that the Student-$t$ model assigns substantially
higher (less negative) log-density to the extreme observation. Summed over $p = 6$
features this gap compounds, shifting the MAP decision toward Stress under the
Student-$t$ model and away from it under the Gaussian model.

---

# Advanced Usage

## Custom Prior Probabilities

The default prior is the empirical class frequency. A domain expert may wish to
override this---for example, to reflect the fact that Stress regimes are
rare in calendar time (say, 10%) even though they are over-represented in the
training corpus.

```{r custom-prior}
# Stress is rare in real time: 10% prior weight
model_informed <- student_t_naive_bayes(
  x     = x_train,
  y     = y_train,
  prior = c(Calm = 0.50, Steady = 0.40, Stress = 0.10)
)

cat("Priors (custom):\n")
print(model_informed$prior)

# The classifier becomes more conservative about Stress alerts
prob_informed <- predict(model_informed, newdata = x_test, type = "prob")
cat("\nMean posterior P(Stress) — uniform prior:  ",
    round(mean(pred_prob[, "Stress"]), 4), "\n")
cat("Mean posterior P(Stress) — informed prior: ",
    round(mean(prob_informed[, "Stress"]), 4), "\n")
```

## Tuning the $\nu$ Grid

The default grid `c(3:30, 40, 60, 100)` is designed to be broad enough for most
applications. Two adjustments are useful:

**Restricting to heavy tails only** --- If the analyst knows that *all* features
in *all* classes are fat-tailed (e.g., intraday tick data during a crisis), a
smaller grid reduces fitting time and prevents the model from selecting spuriously
high $\nu$ on small samples:

```{r custom-grid-heavy}
model_heavy <- student_t_naive_bayes(
  x       = x_train,
  y       = y_train,
  nu_grid = c(3, 4, 5, 6, 7, 8, 10)
)
model_heavy$params$nu
```

**Extending to near-Gaussian** --- Conversely, if the data contain a benign
asset class where Gaussian behaviour is plausible even in crisis, adding
$\nu = 200$ or $\nu = 500$ gives the grid search a near-Gaussian option:

```{r custom-grid-gaussian}
model_wide <- student_t_naive_bayes(
  x       = x_train,
  y       = y_train,
  nu_grid = c(3:30, 40, 60, 100, 200, 500)
)
```

## Accessing Raw Parameters

All fitted parameters are stored in `$params` as named $K \times p$ matrices.
This makes it easy to derive secondary quantities programmatically.

```{r raw-params}
# Regime-mean volatility (mu of rolling_volatility by class)
model$params$mu[, "rolling_volatility"]

# Tail-thickness ranking across regimes for each feature
apply(model$params$nu, 2, which.min)  # index of class with lowest nu
```

## Predicting with a Subset of Features

`predict()` uses only the intersection of `colnames(newdata)` and the training
features, so prediction on a reduced feature set works without modification:

```{r subset-predict}
# Predict using only the two most important risk features
x_reduced <- x_test[, c("log_return", "drawdown")]
pred_reduced <- predict(model, newdata = x_reduced, type = "class")

cat("Accuracy on full feature set:    ",
    round(mean(pred_class == y_test), 4), "\n")
cat("Accuracy on 2-feature subset:    ",
    round(mean(pred_reduced == y_test), 4), "\n")
```

---

# The Clock of Regimes (COR) Framework

**kronxNBC** was designed as the classification component of the *Clock of
Regimes* (COR) framework for intraday equity futures. This section describes the
empirical pipeline so that users can adapt it to their own data.

## Data Requirements

The COR pipeline requires two inputs:

1. **Price/return series**: a data frame with columns `timestamp`, `close`, and
   `ret` (log-return) at hourly frequency.
2. **Decoded regime labels**: a data frame with a `state` column produced by a
   Hidden Markov Model fitted to the same series [@Hamilton1989].

## Feature Engineering

Six features are computed over a rolling 24-hour window. Each captures a distinct
aspect of market microstructure.

```{r cor-features, eval = FALSE}
library(zoo)

es_data <- read.csv("data.csv",           stringsAsFactors = FALSE)
decoded <- read.csv("decoded_states.csv", stringsAsFactors = FALSE)
es_data <- es_data[!is.na(es_data$ret), ]
n_roll  <- 24L

# 1. Rolling volatility (SD of log-returns)
rolling_vol <- rollapply(es_data$ret, n_roll, sd,   fill = NA, align = "right")
rolling_vol <- pmax(rolling_vol, 0.0001)

# 2. Drawdown from rolling peak
rolling_max <- rollapply(es_data$close, n_roll, max, fill = NA, align = "right")
drawdown    <- (es_data$close - rolling_max) / rolling_max

# 3. Downside semi-deviation (left-tail sensitivity)
downside_dev <- function(x) {
  neg_x <- x[x < 0]
  if (length(neg_x) == 0L) return(0)
  sqrt(mean(neg_x^2))
}
trans_stress <- rollapply(es_data$ret, n_roll, downside_dev, fill = NA, align = "right")

# 4. Residence pressure (consecutive hours in drawdown)
is_dd  <- as.integer(drawdown < -0.005)
is_dd[is.na(is_dd)] <- 0L
res_pr <- ave(is_dd, cumsum(is_dd == 0L), FUN = cumsum)

# 5. Ruin proxy: P(return <= -2%) under rolling distribution
rolling_mean <- rollapply(es_data$ret, n_roll, mean, fill = NA, align = "right")
ruin_proxy   <- pnorm(-0.02, mean = rolling_mean, sd = rolling_vol)

# Assemble and attach HMM labels
state_labels <- c("1" = "Stress", "2" = "Calm", "3" = "Steady")

cor_data <- data.frame(
  timestamp          = es_data$timestamp,
  log_return         = es_data$ret,
  rolling_volatility = rolling_vol,
  drawdown           = drawdown,
  transition_stress  = trans_stress,
  residence_pressure = pmax(res_pr, 0.0001),
  ruin_proxy         = pmax(ruin_proxy, 0.0001),
  regime             = factor(
    state_labels[as.character(decoded$state)],
    levels = c("Calm", "Steady", "Stress")
  )
)
cor_data <- cor_data[complete.cases(cor_data), ]
```

## Train/Test Split Strategy

Financial regime data exhibit strong temporal clustering: a Stress episode may
span 48--200 consecutive hours. A chronological 80/20 split therefore risks
placing an entire Stress cluster exclusively in the test set, leaving the
training corpus with zero Stress observations and guaranteeing classification
collapse.

A random 80/20 split breaks the temporal adjacency of observations, ensuring
every regime class is represented in both partitions regardless of when in
calendar time each episode occurred.

> **Caveat**: random sampling allows observations from the same cluster to appear
> in both train and test sets, introducing distributional leakage across the
> split boundary. For a production backtesting framework a purged, embargo-based
> cross-validation scheme (e.g., `mlr3` + `PurgedCV`) is preferred. For the
> diagnostic regime classifier, the random split is the correct choice because
> the primary concern is class-balance coverage, not look-ahead bias.

```{r cor-split, eval = FALSE}
features <- c("log_return", "rolling_volatility", "drawdown",
              "transition_stress", "residence_pressure", "ruin_proxy")

set.seed(123L)
tr_idx  <- sample(nrow(cor_data), floor(0.80 * nrow(cor_data)))
x_train <- as.matrix(cor_data[ tr_idx, features])
y_train <- cor_data$regime[ tr_idx]
x_test  <- as.matrix(cor_data[-tr_idx, features])
y_test  <- cor_data$regime[-tr_idx]

model <- student_t_naive_bayes(x_train, y_train)
```

## Interpreting Empirical $\nu$ Estimates

On real COR data fitted to hourly E-mini S&P 500 returns (2015--2023), the
profile grid search consistently finds:

| Feature | Calm $\hat\nu$ | Steady $\hat\nu$ | Stress $\hat\nu$ |
|---------|--------------|---------------|----------------|
| `log_return` | 20--30 | 15--25 | **3--6** |
| `drawdown`   | 15--25 | 12--20 | **3--8** |
| `rolling_volatility` | 20--30 | 15--25 | 10--20 |
| `ruin_proxy` | 25--30 | 20--28 | **4--10** |

The consistently low $\hat\nu$ for Stress on `log_return`, `drawdown`, and
`ruin_proxy` is an empirical validation of the fat-tail hypothesis: crisis returns
are not merely high-variance Gaussian draws but draws from a genuinely different,
heavy-tailed distribution.

---

# Summary

**kronxNBC** provides a robust alternative to Gaussian Naive Bayes for
classification problems where some class-conditional distributions exhibit heavy
tails. The key contributions are:

1. **Student-$t$ class-conditional likelihood**: naturally accommodates fat-tailed
   returns without the tail truncation implicit in Gaussian models.

2. **Profile grid search for $\nu$**: replaces numerically fragile continuous
   optimisation with a fast, stable discrete search. The IRLS update step at each
   grid point provides a good initialisation at minimal computational cost.

3. **Per-(class, feature) $\nu$ selection**: allows the model to be simultaneously
   fat-tailed for Stress features and near-Gaussian for Calm features---a
   flexibility that a single global $\nu$ cannot provide.

4. **Log-density floor**: prevents classification collapse when a single feature
   produces an extreme log-density, complementing the heavy-tail robustness of the
   Student-$t$ kernel.

5. **naivebayes compatibility**: the `tables()` generic extension means
   `student_t_naive_bayes` objects integrate seamlessly with existing **naivebayes**
   workflows.

The package is available on CRAN as **kronxNBC** and requires R >= 4.0.0.

---

# Appendix: Session Information {-}

```{r session-info}
sessionInfo()
```

---

# References {-}
