KRONXnbc implements a Clock of Regimes (COR) classifier: a Student-t Naive Bayes model designed for non-stationary financial market data. Three market regimes are distinguished:
| Regime | Economic intuition |
|---|---|
| Calm | Low volatility, mean-reverting returns |
| Steady | Moderate drift, controlled drawdowns |
| Stress | Fat-tailed returns, deep drawdowns, elevated ruin probability |
The distinguishing engineering choice is a profile grid search over the degrees-of-freedom parameter \(\nu\) of the Student-t likelihood. Rather than fixing \(\nu\) or solving a numerically fragile continuous optimisation, the model evaluates a discrete grid \(\nu \in \{3, 4, \ldots, 30, 40, 60, 100\}\) for every (class, feature) pair and selects the \(\nu\) that maximises the profile log-likelihood. This prevents the \(-\infty\) log-density underflow that collapses a standard Gaussian NBC when a crisis observation falls in the far tail.
The raw input is an hourly equity price series (e.g. E-mini S&P
500 futures, data.csv) paired with a file of decoded HMM
regime labels (decoded_states.csv). The
input2nbc.R pipeline constructs six continuous predictors
over a 24-hour rolling window.
library(zoo)
es_data <- read.csv("data.csv", stringsAsFactors = FALSE)
decoded <- read.csv("decoded_states.csv", stringsAsFactors = FALSE)
es_data <- es_data[!is.na(es_data$ret), ] # drop leading NA
stopifnot(nrow(es_data) == nrow(decoded))
n_roll <- 24L # 24-hour window
cor_data <- data.frame(
timestamp = es_data$timestamp,
log_return = es_data$ret
)Standard deviation of log-returns over the window; floored at 0.0001 to avoid zero-SD degeneracy on flat-market bars.
Measures how far the current close has fallen from the rolling 24-hour peak. Values are zero or negative; a reading of \(-0.03\) means the price is 3 % below its recent high.
\[ \text{Drawdown}_t = \frac{\text{Close}_t - \max_{s \in [t-23,\, t]} \text{Close}_s} {\max_{s \in [t-23,\, t]} \text{Close}_s} \]
Unlike rolling volatility — which treats up and down moves symmetrically — the downside semi-deviation isolates the left tail of the return distribution. It is the root-mean-square of negative returns only, making it highly sensitive to the onset of a Stress episode even when overall volatility is still moderate.
\[ \text{SemiDev}_t = \sqrt{\frac{1}{|\mathcal{N}|} \sum_{r \in \mathcal{N}} r^2}, \qquad \mathcal{N} = \{r_s : r_s < 0,\; s \in [t-23,\, t]\} \]
Counts consecutive hours spent in drawdown (defined as drawdown \(< -0.5\%\)). A long, uninterrupted drawdown streak signals structural regime persistence rather than a momentary spike.
The probability of a \(-2\%\) or worse move under the current rolling distribution — i.e. \(\Phi\!\left(\frac{-0.02 - \hat\mu_t}{\hat\sigma_t}\right)\). This forward-looking tail-risk measure rises sharply just before a Stress transition.
# KRONX empirical label mapping (derived from HMM state ordering)
state_labels <- c("1" = "Stress", "2" = "Calm", "3" = "Steady")
cor_data$regime <- factor(
state_labels[as.character(decoded$state)],
levels = c("Calm", "Steady", "Stress")
)
cor_data <- cor_data[complete.cases(cor_data), ] # drop rolling-window NAs
write.csv(cor_data, file = "nbc_analysis_report.txt", row.names = FALSE)A natural instinct for time-series data is to train on the first 80 % of observations and test on the last 20 %. For COR data this fails for a structural reason: financial regimes cluster.
Hourly market data exhibits strong regime persistence — a Stress episode may last 48–200 consecutive hours. A chronological cut therefore risks placing an entire regime cluster exclusively in the test set, leaving the training set with zero (or near-zero) Stress observations. The classifier then has no template for Stress and is forced to assign all Stress observations to the nearest alternative regime, producing classification collapse rather than a meaningful accuracy estimate.
Random 80/20 sampling breaks the temporal adjacency of observations, ensuring every regime class is represented in both partitions regardless of where in calendar time the Stress episodes happened to occur.
Trade-off acknowledged: random sampling leaks distributional information across the split boundary (observations from the same cluster appear in both train and test). For a production backtesting framework a purged, embargo-based cross-validation scheme (e.g.
mlr3+PurgedCV) is preferred. For this diagnostic classifier the random split is the correct choice.
cor_data <- read.csv("nbc_analysis_report.txt", stringsAsFactors = FALSE)
cor_data$regime <- factor(cor_data$regime, levels = c("Calm", "Steady", "Stress"))
cor_data <- cor_data[!is.na(cor_data$regime), ]
features <- c("log_return", "rolling_volatility", "drawdown",
"transition_stress", "residence_pressure", "ruin_proxy")
set.seed(123)
train_idx <- sample(seq_len(nrow(cor_data)), size = floor(0.80 * nrow(cor_data)))
train <- cor_data[ train_idx, ]
test <- cor_data[-train_idx, ]
x_train <- as.matrix(train[, features]); y_train <- train$regime
x_test <- as.matrix(test[, features]); y_test <- test$regimeA self-contained synthetic demonstration using the same six feature names:
library(kronxNBC)
set.seed(42L)
n <- 300L
mk <- n / 3L
# Mimic the distributional shape of each regime
X_syn <- rbind(
data.frame( # Calm
log_return = rnorm(mk, 0.0002, 0.003),
rolling_volatility = rnorm(mk, 0.004, 0.001),
drawdown = rnorm(mk, -0.002, 0.002),
transition_stress = abs(rnorm(mk, 0.001, 0.0005)),
residence_pressure = rpois(mk, 1),
ruin_proxy = rbeta(mk, 1, 20)
),
data.frame( # Steady
log_return = rnorm(mk, 0.0005, 0.005),
rolling_volatility = rnorm(mk, 0.008, 0.002),
drawdown = rnorm(mk, -0.008, 0.004),
transition_stress = abs(rnorm(mk, 0.003, 0.001)),
residence_pressure = rpois(mk, 3),
ruin_proxy = rbeta(mk, 2, 10)
),
data.frame( # Stress: fat-tailed
log_return = rt(mk, df = 3) * 0.012,
rolling_volatility = rnorm(mk, 0.022, 0.005),
drawdown = rnorm(mk, -0.030, 0.010),
transition_stress = abs(rnorm(mk, 0.015, 0.005)),
residence_pressure = rpois(mk, 12),
ruin_proxy = rbeta(mk, 5, 3)
)
)
X_syn <- as.matrix(X_syn)
y_syn <- factor(
rep(c("Calm", "Steady", "Stress"), each = mk),
levels = c("Calm", "Steady", "Stress")
)
set.seed(7L)
tr_idx <- sample(n, size = floor(0.8 * n))
x_train <- X_syn[ tr_idx, ]; y_train <- y_syn[ tr_idx]
x_test <- X_syn[-tr_idx, ]; y_test <- y_syn[-tr_idx]
model <- student_t_naive_bayes(x_train, y_train)
summary(model)
#>
#> ============================ Student-t Naive Bayes ============================
#>
#> - Call: student_t_naive_bayes(x = x_train, y = y_train)
#> - Samples: 240
#> - Features: 6
#> - nu grid range: 3 to 100
#> - Prior probabilities:
#> - Calm: 0.3417
#> - Steady: 0.3125
#> - Stress: 0.3458
#>
#> -------------------------------------------------------------------------------tabs <- tables(model)
print(tabs)
#> $log_return
#> Calm Steady Stress
#> mu 3.768694e-04 4.757325e-05 -1.158370e-03
#> sd 3.091415e-03 4.948078e-03 1.447585e-02
#> nu 3.000000e+01 1.000000e+02 9.000000e+00
#>
#> $rolling_volatility
#> Calm Steady Stress
#> mu 3.954591e-03 7.661608e-03 2.174149e-02
#> sd 9.037302e-04 1.708582e-03 4.649135e-03
#> nu 1.000000e+02 4.000000e+01 1.000000e+02
#>
#> $drawdown
#> Calm Steady Stress
#> mu -0.002126564 -0.008043378 -0.029062709
#> sd 0.002051736 0.003979391 0.010660654
#> nu 100.000000000 27.000000000 100.000000000
#>
#> $transition_stress
#> Calm Steady Stress
#> mu 9.923345e-04 3.106482e-03 1.602354e-02
#> sd 4.205171e-04 1.150587e-03 5.069987e-03
#> nu 1.000000e+02 6.000000e+01 4.000000e+01
#>
#> $residence_pressure
#> Calm Steady Stress
#> mu 0.9445213 2.7786013 11.9450720
#> sd 0.9865471 1.6836134 3.4498315
#> nu 100.0000000 100.0000000 100.0000000
#>
#> $ruin_proxy
#> Calm Steady Stress
#> mu 0.04645572 0.16278727 0.64599386
#> sd 0.05494230 0.11070799 0.15403066
#> nu 6.00000000 15.00000000 100.00000000
#>
#> attr(,"class")
#> [1] "naive_bayes_tables"
#> attr(,"cond_dist")
#> log_return rolling_volatility drawdown transition_stress
#> "Student-t" "Student-t" "Student-t" "Student-t"
#> residence_pressure ruin_proxy
#> "Student-t" "Student-t"coef(model)
#> Calm:mu Calm:sd Calm:nu Steady:mu Steady:sd
#> log_return 0.0003768694 0.0030914147 30 4.757325e-05 0.004948078
#> rolling_volatility 0.0039545908 0.0009037302 100 7.661608e-03 0.001708582
#> drawdown -0.0021265636 0.0020517360 100 -8.043378e-03 0.003979391
#> transition_stress 0.0009923345 0.0004205171 100 3.106482e-03 0.001150587
#> residence_pressure 0.9445213006 0.9865470622 100 2.778601e+00 1.683613368
#> ruin_proxy 0.0464557214 0.0549422995 6 1.627873e-01 0.110707985
#> Steady:nu Stress:mu Stress:sd Stress:nu
#> log_return 100 -0.00115837 0.014475850 9
#> rolling_volatility 40 0.02174149 0.004649135 100
#> drawdown 27 -0.02906271 0.010660654 100
#> transition_stress 60 0.01602354 0.005069987 40
#> residence_pressure 100 11.94507202 3.449831495 100
#> ruin_proxy 15 0.64599386 0.154030658 100Observations where the posterior probability of the Stress regime exceeds 60 % trigger a COR Stress Alert — an actionable signal for risk managers to review position sizing or hedging.
stress_prob <- pred_prob[, "Stress"]
alert_flag <- ifelse(stress_prob > 0.60, "COR Stress Alert", "No Alert")
cat("\nCOR Stress Alert Summary (test period):\n")
#>
#> COR Stress Alert Summary (test period):
print(table(alert_flag))
#> alert_flag
#> COR Stress Alert No Alert
#> 17 43
cat("\nPosterior Stress probability — first 10 test observations:\n")
#>
#> Posterior Stress probability — first 10 test observations:
print(round(head(stress_prob, 10L), 4))
#> [1] 0 0 0 0 0 0 0 0 0 0The most theoretically important output is the per-feature, per-class degrees-of-freedom estimates. Extracting them directly from the parameter matrices:
nu_df <- as.data.frame(t(model$params$nu))
colnames(nu_df) <- paste0("nu.", c("Calm", "Steady", "Stress"))
nu_df
#> nu.Calm nu.Steady nu.Stress
#> log_return 30 100 9
#> rolling_volatility 100 40 100
#> drawdown 100 27 100
#> transition_stress 100 60 40
#> residence_pressure 100 100 100
#> ruin_proxy 6 15 100Under a Student-t distribution:
When fitted to real COR data, the Stress regime consistently receives
\(\nu \approx 3\)–\(6\) on log_return and
drawdown, while Calm receives \(\nu > 20\). This is not a modelling
assumption — it is an empirical finding that emerges from the
profile grid search.
This finding validates the core financial hypothesis:
Crisis episodes are not merely high-volatility Gaussian events. They are draws from a genuinely different, fat-tailed distribution that a Gaussian NBC cannot represent without catastrophic classification failure.
The grid search selects the \(\nu\) that best explains the observed data under the Student-t family. A low \(\nu\) on Stress features is therefore both a diagnostic of past crises and a structural reason why the KRONXnbc classifier is more reliable than a standard Gaussian Naive Bayes during market dislocations.
nu_stress_ret <- model$params$nu["Stress", "log_return"]
nu_calm_ret <- model$params$nu["Calm", "log_return"]
cat(sprintf(
"log_return: nu(Stress) = %.0f | nu(Calm) = %.0f\n",
nu_stress_ret, nu_calm_ret
))
#> log_return: nu(Stress) = 9 | nu(Calm) = 30
if (nu_stress_ret < nu_calm_ret) {
cat("=> Stress regime shows heavier tails on log_return, as hypothesised.\n")
} else {
cat("=> Note: with this synthetic data nu ordering may differ from empirical results.\n")
}
#> => Stress regime shows heavier tails on log_return, as hypothesised.sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Sequoia 15.7.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Riga
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] kronxNBC_0.1.1
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.57
#> [5] cachem_1.1.0 knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
#> [9] lifecycle_1.0.5 cli_3.6.6 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.6.0 tools_4.6.0 evaluate_1.0.5 bslib_0.11.0
#> [17] yaml_2.3.12 otel_0.2.0 rlang_1.2.0 jsonlite_2.0.0