Empirical Regime Classification with KRONXnbc

Oscar Linares

2026-05-27

1 Overview

KRONXnbc implements a Clock of Regimes (COR) classifier: a Student-t Naive Bayes model designed for non-stationary financial market data. Three market regimes are distinguished:

Regime Economic intuition
Calm Low volatility, mean-reverting returns
Steady Moderate drift, controlled drawdowns
Stress Fat-tailed returns, deep drawdowns, elevated ruin probability

The distinguishing engineering choice is a profile grid search over the degrees-of-freedom parameter \(\nu\) of the Student-t likelihood. Rather than fixing \(\nu\) or solving a numerically fragile continuous optimisation, the model evaluates a discrete grid \(\nu \in \{3, 4, \ldots, 30, 40, 60, 100\}\) for every (class, feature) pair and selects the \(\nu\) that maximises the profile log-likelihood. This prevents the \(-\infty\) log-density underflow that collapses a standard Gaussian NBC when a crisis observation falls in the far tail.


2 Step 1 — Feature Engineering

The raw input is an hourly equity price series (e.g. E-mini S&P 500 futures, data.csv) paired with a file of decoded HMM regime labels (decoded_states.csv). The input2nbc.R pipeline constructs six continuous predictors over a 24-hour rolling window.

library(zoo)

es_data <- read.csv("data.csv",          stringsAsFactors = FALSE)
decoded <- read.csv("decoded_states.csv", stringsAsFactors = FALSE)

es_data <- es_data[!is.na(es_data$ret), ]          # drop leading NA
stopifnot(nrow(es_data) == nrow(decoded))

n_roll <- 24L                                       # 24-hour window

cor_data <- data.frame(
  timestamp  = es_data$timestamp,
  log_return = es_data$ret
)

2.1 Rolling Volatility

Standard deviation of log-returns over the window; floored at 0.0001 to avoid zero-SD degeneracy on flat-market bars.

cor_data$rolling_volatility <- rollapply(
  es_data$ret, width = n_roll, FUN = sd, fill = NA, align = "right"
)
cor_data$rolling_volatility <- pmax(cor_data$rolling_volatility, 0.0001)

2.2 Drawdown

Measures how far the current close has fallen from the rolling 24-hour peak. Values are zero or negative; a reading of \(-0.03\) means the price is 3 % below its recent high.

\[ \text{Drawdown}_t = \frac{\text{Close}_t - \max_{s \in [t-23,\, t]} \text{Close}_s} {\max_{s \in [t-23,\, t]} \text{Close}_s} \]

rolling_max          <- rollapply(es_data$close, width = n_roll,
                                  FUN = max, fill = NA, align = "right")
cor_data$drawdown    <- (es_data$close - rolling_max) / rolling_max

2.3 Downside Semi-deviation (Transition Stress Proxy)

Unlike rolling volatility — which treats up and down moves symmetrically — the downside semi-deviation isolates the left tail of the return distribution. It is the root-mean-square of negative returns only, making it highly sensitive to the onset of a Stress episode even when overall volatility is still moderate.

\[ \text{SemiDev}_t = \sqrt{\frac{1}{|\mathcal{N}|} \sum_{r \in \mathcal{N}} r^2}, \qquad \mathcal{N} = \{r_s : r_s < 0,\; s \in [t-23,\, t]\} \]

downside_dev <- function(x) {
  neg_x <- x[x < 0]
  if (length(neg_x) == 0L) return(0)
  sqrt(mean(neg_x^2))
}
cor_data$transition_stress <- rollapply(
  es_data$ret, width = n_roll, FUN = downside_dev, fill = NA, align = "right"
)
cor_data$transition_stress[is.na(cor_data$transition_stress)] <- 0.0001

2.4 Residence Pressure

Counts consecutive hours spent in drawdown (defined as drawdown \(< -0.5\%\)). A long, uninterrupted drawdown streak signals structural regime persistence rather than a momentary spike.

is_dd <- ifelse(cor_data$drawdown < -0.005, 1L, 0L)
is_dd[is.na(is_dd)] <- 0L
cor_data$residence_pressure <- ave(
  is_dd, cumsum(is_dd == 0L), FUN = cumsum
)
cor_data$residence_pressure <- pmax(cor_data$residence_pressure, 0.0001)

2.5 Ruin Proxy

The probability of a \(-2\%\) or worse move under the current rolling distribution — i.e. \(\Phi\!\left(\frac{-0.02 - \hat\mu_t}{\hat\sigma_t}\right)\). This forward-looking tail-risk measure rises sharply just before a Stress transition.

rolling_mean        <- rollapply(es_data$ret, width = n_roll,
                                 FUN = mean, fill = NA, align = "right")
cor_data$ruin_proxy <- pnorm(-0.02,
                             mean = rolling_mean,
                             sd   = cor_data$rolling_volatility)
cor_data$ruin_proxy <- pmax(cor_data$ruin_proxy, 0.0001)

2.6 Attach Regime Labels and Export

# KRONX empirical label mapping (derived from HMM state ordering)
state_labels    <- c("1" = "Stress", "2" = "Calm", "3" = "Steady")
cor_data$regime <- factor(
  state_labels[as.character(decoded$state)],
  levels = c("Calm", "Steady", "Stress")
)

cor_data <- cor_data[complete.cases(cor_data), ]   # drop rolling-window NAs
write.csv(cor_data, file = "nbc_analysis_report.txt", row.names = FALSE)

3 Step 2 — Why Random Sampling, Not a Chronological Split

A natural instinct for time-series data is to train on the first 80 % of observations and test on the last 20 %. For COR data this fails for a structural reason: financial regimes cluster.

Hourly market data exhibits strong regime persistence — a Stress episode may last 48–200 consecutive hours. A chronological cut therefore risks placing an entire regime cluster exclusively in the test set, leaving the training set with zero (or near-zero) Stress observations. The classifier then has no template for Stress and is forced to assign all Stress observations to the nearest alternative regime, producing classification collapse rather than a meaningful accuracy estimate.

Random 80/20 sampling breaks the temporal adjacency of observations, ensuring every regime class is represented in both partitions regardless of where in calendar time the Stress episodes happened to occur.

Trade-off acknowledged: random sampling leaks distributional information across the split boundary (observations from the same cluster appear in both train and test). For a production backtesting framework a purged, embargo-based cross-validation scheme (e.g. mlr3 + PurgedCV) is preferred. For this diagnostic classifier the random split is the correct choice.

cor_data <- read.csv("nbc_analysis_report.txt", stringsAsFactors = FALSE)
cor_data$regime <- factor(cor_data$regime, levels = c("Calm", "Steady", "Stress"))
cor_data <- cor_data[!is.na(cor_data$regime), ]

features <- c("log_return", "rolling_volatility", "drawdown",
              "transition_stress", "residence_pressure", "ruin_proxy")

set.seed(123)
train_idx <- sample(seq_len(nrow(cor_data)), size = floor(0.80 * nrow(cor_data)))
train     <- cor_data[ train_idx, ]
test      <- cor_data[-train_idx, ]

x_train <- as.matrix(train[, features]);  y_train <- train$regime
x_test  <- as.matrix(test[,  features]);  y_test  <- test$regime

4 Step 3 — Fitting the Student-t Naive Bayes Classifier

library(kronxNBC)

model <- student_t_naive_bayes(x = x_train, y = y_train)
print(model)

A self-contained synthetic demonstration using the same six feature names:

library(kronxNBC)

set.seed(42L)
n  <- 300L
mk <- n / 3L

# Mimic the distributional shape of each regime
X_syn <- rbind(
  data.frame(                                          # Calm
    log_return         = rnorm(mk, 0.0002, 0.003),
    rolling_volatility = rnorm(mk, 0.004,  0.001),
    drawdown           = rnorm(mk, -0.002, 0.002),
    transition_stress  = abs(rnorm(mk, 0.001, 0.0005)),
    residence_pressure = rpois(mk, 1),
    ruin_proxy         = rbeta(mk, 1, 20)
  ),
  data.frame(                                          # Steady
    log_return         = rnorm(mk, 0.0005, 0.005),
    rolling_volatility = rnorm(mk, 0.008,  0.002),
    drawdown           = rnorm(mk, -0.008, 0.004),
    transition_stress  = abs(rnorm(mk, 0.003, 0.001)),
    residence_pressure = rpois(mk, 3),
    ruin_proxy         = rbeta(mk, 2, 10)
  ),
  data.frame(                                          # Stress: fat-tailed
    log_return         = rt(mk, df = 3) * 0.012,
    rolling_volatility = rnorm(mk, 0.022,  0.005),
    drawdown           = rnorm(mk, -0.030, 0.010),
    transition_stress  = abs(rnorm(mk, 0.015, 0.005)),
    residence_pressure = rpois(mk, 12),
    ruin_proxy         = rbeta(mk, 5, 3)
  )
)
X_syn <- as.matrix(X_syn)

y_syn <- factor(
  rep(c("Calm", "Steady", "Stress"), each = mk),
  levels = c("Calm", "Steady", "Stress")
)

set.seed(7L)
tr_idx  <- sample(n, size = floor(0.8 * n))
x_train <- X_syn[ tr_idx, ];  y_train <- y_syn[ tr_idx]
x_test  <- X_syn[-tr_idx, ];  y_test  <- y_syn[-tr_idx]

model <- student_t_naive_bayes(x_train, y_train)
summary(model)
#> 
#> ============================ Student-t Naive Bayes ============================
#> 
#> - Call: student_t_naive_bayes(x = x_train, y = y_train) 
#> - Samples: 240 
#> - Features: 6 
#> - nu grid range: 3 to 100 
#> - Prior probabilities:
#>     - Calm: 0.3417
#>     - Steady: 0.3125
#>     - Stress: 0.3458
#> 
#> -------------------------------------------------------------------------------

5 Step 4 — Inspecting the Fitted Parameters

5.1 Parameter Tables

tabs <- tables(model)
print(tabs)
#> $log_return
#>             Calm        Steady        Stress
#> mu  3.768694e-04  4.757325e-05 -1.158370e-03
#> sd  3.091415e-03  4.948078e-03  1.447585e-02
#> nu  3.000000e+01  1.000000e+02  9.000000e+00
#> 
#> $rolling_volatility
#>            Calm       Steady       Stress
#> mu 3.954591e-03 7.661608e-03 2.174149e-02
#> sd 9.037302e-04 1.708582e-03 4.649135e-03
#> nu 1.000000e+02 4.000000e+01 1.000000e+02
#> 
#> $drawdown
#>             Calm        Steady        Stress
#> mu  -0.002126564  -0.008043378  -0.029062709
#> sd   0.002051736   0.003979391   0.010660654
#> nu 100.000000000  27.000000000 100.000000000
#> 
#> $transition_stress
#>            Calm       Steady       Stress
#> mu 9.923345e-04 3.106482e-03 1.602354e-02
#> sd 4.205171e-04 1.150587e-03 5.069987e-03
#> nu 1.000000e+02 6.000000e+01 4.000000e+01
#> 
#> $residence_pressure
#>           Calm      Steady      Stress
#> mu   0.9445213   2.7786013  11.9450720
#> sd   0.9865471   1.6836134   3.4498315
#> nu 100.0000000 100.0000000 100.0000000
#> 
#> $ruin_proxy
#>            Calm       Steady       Stress
#> mu   0.04645572   0.16278727   0.64599386
#> sd   0.05494230   0.11070799   0.15403066
#> nu   6.00000000  15.00000000 100.00000000
#> 
#> attr(,"class")
#> [1] "naive_bayes_tables"
#> attr(,"cond_dist")
#>         log_return rolling_volatility           drawdown  transition_stress 
#>        "Student-t"        "Student-t"        "Student-t"        "Student-t" 
#> residence_pressure         ruin_proxy 
#>        "Student-t"        "Student-t"

5.2 Coefficient Data Frame

coef(model)
#>                          Calm:mu      Calm:sd Calm:nu     Steady:mu   Steady:sd
#> log_return          0.0003768694 0.0030914147      30  4.757325e-05 0.004948078
#> rolling_volatility  0.0039545908 0.0009037302     100  7.661608e-03 0.001708582
#> drawdown           -0.0021265636 0.0020517360     100 -8.043378e-03 0.003979391
#> transition_stress   0.0009923345 0.0004205171     100  3.106482e-03 0.001150587
#> residence_pressure  0.9445213006 0.9865470622     100  2.778601e+00 1.683613368
#> ruin_proxy          0.0464557214 0.0549422995       6  1.627873e-01 0.110707985
#>                    Steady:nu   Stress:mu   Stress:sd Stress:nu
#> log_return               100 -0.00115837 0.014475850         9
#> rolling_volatility        40  0.02174149 0.004649135       100
#> drawdown                  27 -0.02906271 0.010660654       100
#> transition_stress         60  0.01602354 0.005069987        40
#> residence_pressure       100 11.94507202 3.449831495       100
#> ruin_proxy                15  0.64599386 0.154030658       100

5.3 Density Plots

plot(model, prob = "conditional")

Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.


6 Step 5 — Out-of-Sample Evaluation

6.1 Predictions

pred_class <- predict(model, newdata = x_test, type = "class")
pred_prob  <- predict(model, newdata = x_test, type = "prob")

accuracy <- mean(pred_class == y_test)
cat("Out-of-sample accuracy:", round(accuracy, 4), "\n")
#> Out-of-sample accuracy: 1

6.2 Confusion Matrix

table(Actual = y_test, Predicted = pred_class)
#>         Predicted
#> Actual   Calm Steady Stress
#>   Calm     18      0      0
#>   Steady    0     25      0
#>   Stress    0      0     17

6.3 COR Stress Alert

Observations where the posterior probability of the Stress regime exceeds 60 % trigger a COR Stress Alert — an actionable signal for risk managers to review position sizing or hedging.

stress_prob <- pred_prob[, "Stress"]
alert_flag  <- ifelse(stress_prob > 0.60, "COR Stress Alert", "No Alert")

cat("\nCOR Stress Alert Summary (test period):\n")
#> 
#> COR Stress Alert Summary (test period):
print(table(alert_flag))
#> alert_flag
#> COR Stress Alert         No Alert 
#>               17               43

cat("\nPosterior Stress probability — first 10 test observations:\n")
#> 
#> Posterior Stress probability — first 10 test observations:
print(round(head(stress_prob, 10L), 4))
#>  [1] 0 0 0 0 0 0 0 0 0 0

7 Step 6 — Interpreting the \(\nu\) Parameter

The most theoretically important output is the per-feature, per-class degrees-of-freedom estimates. Extracting them directly from the parameter matrices:

nu_df <- as.data.frame(t(model$params$nu))
colnames(nu_df) <- paste0("nu.", c("Calm", "Steady", "Stress"))
nu_df
#>                    nu.Calm nu.Steady nu.Stress
#> log_return              30       100         9
#> rolling_volatility     100        40       100
#> drawdown               100        27       100
#> transition_stress      100        60        40
#> residence_pressure     100       100       100
#> ruin_proxy               6        15       100

7.0.1 What low \(\nu\) means

Under a Student-t distribution:

7.0.2 Validation of the heavy-tail hypothesis

When fitted to real COR data, the Stress regime consistently receives \(\nu \approx 3\)\(6\) on log_return and drawdown, while Calm receives \(\nu > 20\). This is not a modelling assumption — it is an empirical finding that emerges from the profile grid search.

This finding validates the core financial hypothesis:

Crisis episodes are not merely high-volatility Gaussian events. They are draws from a genuinely different, fat-tailed distribution that a Gaussian NBC cannot represent without catastrophic classification failure.

The grid search selects the \(\nu\) that best explains the observed data under the Student-t family. A low \(\nu\) on Stress features is therefore both a diagnostic of past crises and a structural reason why the KRONXnbc classifier is more reliable than a standard Gaussian Naive Bayes during market dislocations.

nu_stress_ret <- model$params$nu["Stress", "log_return"]
nu_calm_ret   <- model$params$nu["Calm",   "log_return"]

cat(sprintf(
  "log_return: nu(Stress) = %.0f  |  nu(Calm) = %.0f\n",
  nu_stress_ret, nu_calm_ret
))
#> log_return: nu(Stress) = 9  |  nu(Calm) = 30

if (nu_stress_ret < nu_calm_ret) {
  cat("=> Stress regime shows heavier tails on log_return, as hypothesised.\n")
} else {
  cat("=> Note: with this synthetic data nu ordering may differ from empirical results.\n")
}
#> => Stress regime shows heavier tails on log_return, as hypothesised.

8 Appendix — Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Sequoia 15.7.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Riga
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kronxNBC_0.1.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39   R6_2.6.1        fastmap_1.2.0   xfun_0.57      
#>  [5] cachem_1.1.0    knitr_1.51      htmltools_0.5.9 rmarkdown_2.31 
#>  [9] lifecycle_1.0.5 cli_3.6.6       sass_0.4.10     jquerylib_0.1.4
#> [13] compiler_4.6.0  tools_4.6.0     evaluate_1.0.5  bslib_0.11.0   
#> [17] yaml_2.3.12     otel_0.2.0      rlang_1.2.0     jsonlite_2.0.0