Financial markets cycle through qualitatively distinct operating
environments—calm, trending, and crisis—that cannot be described by a
single distribution. Standard Gaussian Naive Bayes (NBC) classifiers are
analytically convenient but numerically fragile when crisis-period
returns fall deep in the tail: log-densities approach \(-\infty\) and the classifier suffers
catastrophic misclassification of the very observations it most needs to
identify correctly. kronxNBC addresses this by
replacing the Gaussian class-conditional likelihood with a scaled
Student-\(t\) distribution whose
degrees-of-freedom parameter \(\nu\) is
selected independently for each (class, feature) pair via a profile
log-likelihood grid search over a discrete candidate set. This targeted
search eliminates the numerical instability of continuous \(\nu\) optimisation while automatically
capturing the heavy-tailed character of crisis returns. The package
exports the constructor student_t_naive_bayes() together
with the complete S3 interface: predict(),
print(), summary(), plot(),
coef(), and tables(). All examples in this
paper are self-contained and reproducible without external data.
Keywords: Naive Bayes, Student-t distribution, heavy tails, financial regime detection, robust classification, R.
Classifying the current operating regime of a financial market is a foundational task in risk management, portfolio construction, and algorithmic trading. During tranquil, Calm periods returns are near-Gaussian, volatility is low, and standard portfolio theory applies. During Steady (trend) periods there is a persistent directional drift, and drawdowns are moderate and recoverable. During Stress periods, by contrast, returns exhibit strongly fat-tailed behaviour, drawdowns are severe, and the distributional assumption underlying most risk models breaks down entirely.
The problem is not merely that volatility rises in a Stress regime. The shape of the return distribution changes: the tails become genuinely heavier in a way that cannot be captured by scaling a Gaussian. Blattberg and Gonedes (1974) showed that daily stock returns are better described by a Student-\(t\) distribution than by a normal, and this observation has been replicated across asset classes and frequencies. Kurtosis of daily equity returns routinely exceeds 6; for hourly futures returns during a crisis it can exceed 20.
A Naive Bayes classifier trained on historical regime data uses the class-conditional likelihood \(f(x_j \mid \text{class } k)\) to assign new observations to regimes. Under Gaussian assumptions, a single observation that lies \(5\sigma\) from the class mean receives a log-density near \(-13\); under a Student-\(t\) with \(\nu = 3\) degrees of freedom, the same point receives a log-density near \(-5\). When log-densities are summed across multiple features the Gaussian classifier rapidly accumulates \(-\infty\) contributions that dominate the posterior and force every crisis observation into whichever non-Stress class has the least-negative log-density—a failure mode we call classification collapse.
kronxNBC resolves this by:
threshold) that prevents
any single feature from absorbing the entire posterior.The package is designed to integrate naturally with the
naivebayes ecosystem (Majka
2023): the tables() generic is extended so that
student_t_naive_bayes objects print parameter tables in the
same format, and the tables.default method forwards all
other Naive Bayes classes to naivebayes::tables().
The remainder of this paper is organised as follows. Section 2 develops the statistical methodology. Section 3 describes the package API. Section 4 provides a complete worked example on synthetic data. Section 5 compares kronxNBC with a standard Gaussian NBC. Section 6 covers advanced usage. Section 7 outlines the Clock of Regimes (COR) empirical framework. Section 8 concludes.
Let \(\mathbf{x} = (x_1, \ldots, x_p)^{\top}\) be a \(p\)-dimensional feature vector and \(y \in \{1, \ldots, K\}\) a discrete class label. Under the Naive Bayes conditional-independence assumption, the posterior probability of class \(k\) is
\[ P(y = k \mid \mathbf{x}) \;=\; \frac{\pi_k \prod_{j=1}^{p} f_{kj}(x_j)} {\sum_{\ell=1}^{K} \pi_\ell \prod_{j=1}^{p} f_{\ell j}(x_j)}, \]
where \(\pi_k = P(y = k)\) is the class prior and \(f_{kj}\) is the class-conditional density for feature \(j\) given class \(k\). The Maximum A Posteriori (MAP) rule assigns the new observation to
\[ \hat{k} \;=\; \arg\max_{k \in \{1,\ldots,K\}} \left[ \log \pi_k + \sum_{j=1}^{p} \log f_{kj}(x_j) \right]. \]
Working in log-space is both numerically convenient and necessary: products of small probabilities underflow to zero for any floating-point arithmetic.
kronxNBC models \(f_{kj}\) as a scaled (location-scale) Student-\(t\) density with three parameters: location \(\mu_{kj}\), scale \(\sigma_{kj} > 0\), and degrees of freedom \(\nu_{kj} > 2\). The density is
\[ f_t(x \mid \mu, \sigma, \nu) \;=\; \frac{1}{\sigma}\, t_\nu\!\left(\frac{x - \mu}{\sigma}\right), \]
where \(t_\nu(\cdot)\) denotes the standard (zero location, unit scale) Student-\(t\) density with \(\nu\) degrees of freedom:
\[ t_\nu(z) \;=\; \frac{\Gamma\!\left(\tfrac{\nu+1}{2}\right)} {\sqrt{\nu\pi}\,\Gamma\!\left(\tfrac{\nu}{2}\right)} \left(1 + \frac{z^2}{\nu}\right)^{-(\nu+1)/2}. \]
The requirement \(\nu > 2\) ensures finite variance. Key tail properties:
The Gaussian NBC corresponds to the limit \(\nu \to \infty\). kronxNBC recovers near-Gaussian behaviour when the data support it (high \(\nu\) selected) while automatically using heavier tails where needed (low \(\nu\) selected for Stress).
Given observations \(\{x_1, \ldots, x_n\}\) from one (class, feature) cell, the joint log-likelihood is
\[ \ell(\mu, \sigma, \nu) \;=\; \sum_{i=1}^{n} \left[ \log t_\nu\!\left(\frac{x_i - \mu}{\sigma}\right) - \log \sigma \right]. \]
Continuous optimisation over \(\nu\) is numerically fragile because the log-likelihood surface is nearly flat for \(\nu > 30\) and can have shallow local maxima for very small samples. kronxNBC instead evaluates a discrete candidate grid \(\mathcal{G} = \{3, 4, \ldots, 30, 40, 60, 100\}\) and, for each \(\nu \in \mathcal{G}\), computes the profile estimators \((\hat\mu(\nu), \hat\sigma(\nu))\) via one step of Iteratively Reweighted Least Squares (IRLS):
\[ u_i(\nu) \;=\; \frac{\nu + 1}{\nu + \hat{z}_i^2}, \qquad \hat{z}_i = \frac{x_i - \bar{x}}{\hat{s}}, \]
\[ \hat\mu(\nu) \;=\; \frac{\sum_i u_i(\nu)\, x_i}{\sum_i u_i(\nu)}, \qquad \hat\sigma(\nu) \;=\; \sqrt{\frac{1}{n}\sum_i (x_i - \hat\mu(\nu))^2} + \varepsilon, \]
where \(\varepsilon = 10^{-8}\) is a floor that prevents zero-scale degeneracy on constant features. The IRLS weights \(u_i(\nu)\) down-weight large residuals, making the estimators robust to the outliers that define a Stress regime.
The selected degrees of freedom are
\[ \hat\nu \;=\; \arg\max_{\nu \in \mathcal{G}}\; \ell\!\left(\hat\mu(\nu), \hat\sigma(\nu), \nu\right), \]
and the stored parameters for this (class, feature) cell are \((\hat\mu(\hat\nu), \hat\sigma(\hat\nu), \hat\nu)\).
Even with heavy tails, extreme observations may produce log-densities
that are numerically very negative.
predict.student_t_naive_bayes() applies a log-density
floor: any feature log-density \(\log
f_{kj}(x_j)\) that falls below \(\log(\texttt{threshold})\) is replaced by
\(\log(\texttt{threshold})\). The
default \(\texttt{threshold} = 0.001\)
prevents a single pathological feature from dominating the posterior.
The softmax normalisation of the log-posteriors uses the standard
log-sum-exp identity for numerical stability.
The development version can be installed from source. The package depends on stats, graphics, and utils (all part of base R) and on naivebayes (Majka 2023).
kronxNBC exports two functions and six S3 methods.
| Symbol | Role |
|---|---|
student_t_naive_bayes(x, y, prior, nu_grid) |
Constructor: fit the model |
tables(object, which) |
S3 generic (extended from naivebayes) |
predict(object, newdata, type, threshold, eps) |
Class labels or posterior probabilities |
print(x) |
Concise console display |
summary(object) |
High-level model summary |
plot(x, which, prob, arg.num, legend) |
Per-feature density curves |
coef(object) |
Parameter data frame (\(p \times 3K\)) |
student_t_naive_bayes()student_t_naive_bayes(x, y, prior = NULL, nu_grid = c(3:30, 40, 60, 100), ...)
Arguments
x: A numeric matrix with named columns. Each column is
one predictor; each row is one observation. Data frames are coerced via
as.matrix().y: A factor, character, or logical class-label vector
of the same length as nrow(x). Must contain at least two
distinct levels and at least two observations per level. NA
labels are silently dropped.prior: Optional named numeric vector of prior class
probabilities. Normalised to sum to one. Defaults to empirical class
frequencies.nu_grid: Numeric vector of candidate \(\nu\) values. All entries must be strictly
greater than 2. The default grid c(3:30, 40, 60, 100) spans
extreme kurtosis (\(\nu = 3\)) through
near-Gaussian behaviour (\(\nu =
100\)).Return value
An S3 object of class "student_t_naive_bayes" with
components:
| Component | Content |
|---|---|
$data |
List with $x (training matrix) and $y
(training labels) |
$levels |
Character vector of class levels |
$params |
Named list of \(K \times p\)
matrices $mu, $sd, $nu |
$prior |
Named numeric vector of prior probabilities |
$nu_grid |
The nu_grid used during fitting |
$call |
The matched call |
predict()predict(object, newdata = NULL, type = c("class", "prob"),
threshold = 0.001, eps = 0, ...)
type = "class" (default): returns a factor of MAP class
assignments.type = "prob": returns an \(n
\times K\) matrix of softmax-normalised posterior probabilities.
Row \(i\) sums to 1.threshold: log-density floor (replaces values \(\leq \log(\texttt{eps})\)).eps: densities at or below this value are replaced by
threshold. Set to 0 (default) to use
.Machine$double.xmin.tables()Returns a "naive_bayes_tables" object (a named list, one
element per feature). Each element is a \(3
\times K\) table with rows mu, sd,
nu and one column per class. The which
argument accepts an integer index or character feature name.
coef()Returns a data frame with \(p\) rows
(features) and \(3K\) columns named
<class>:mu, <class>:sd,
<class>:nu. Suitable for programmatic access to all
fitted parameters.
plot()Draws the fitted class-conditional densities for one or more features.
prob = "conditional" (default): the raw \(f_t(x \mid \mu_{kj}, \sigma_{kj},
\nu_{kj})\).prob = "marginal": density scaled by the class prior
\(\pi_k\).which: integer or character vector to select a subset
of features.arg.num: named list of graphical parameters
(col, lty) for the density lines.This section demonstrates the full kronxNBC workflow on synthetic data that mimics the distributional structure of hourly E-mini S&P 500 futures across three financial regimes.
We generate 300 observations (100 per regime) across six features that correspond to the predictors used in the empirical Clock of Regimes (COR) application described in Section 7. The Stress regime uses Student-\(t(3)\) returns to reflect genuinely fat-tailed crisis behaviour.
library(kronxNBC)
set.seed(42L)
n <- 300L
mk <- n / 3L
make_regime <- function(ret_fn, vol_mu, vol_sd, dd_mu, dd_sd,
ts_mu, ts_sd, rp_a, rp_b, rp_lambda) {
data.frame(
log_return = ret_fn(mk),
rolling_volatility = abs(rnorm(mk, vol_mu, vol_sd)),
drawdown = rnorm(mk, dd_mu, dd_sd),
transition_stress = abs(rnorm(mk, ts_mu, ts_sd)),
residence_pressure = rpois(mk, rp_lambda),
ruin_proxy = rbeta(mk, rp_a, rp_b)
)
}
X_syn <- rbind(
# Calm: near-Gaussian, low volatility, shallow drawdowns
make_regime(
ret_fn = function(n) rnorm(n, 0.0002, 0.003),
vol_mu = 0.004, vol_sd = 0.001,
dd_mu = -0.002, dd_sd = 0.002,
ts_mu = 0.001, ts_sd = 0.0005,
rp_a = 1, rp_b = 20, rp_lambda = 1
),
# Steady: positive drift, moderate drawdowns
make_regime(
ret_fn = function(n) rnorm(n, 0.0005, 0.005),
vol_mu = 0.008, vol_sd = 0.002,
dd_mu = -0.008, dd_sd = 0.004,
ts_mu = 0.003, ts_sd = 0.001,
rp_a = 2, rp_b = 10, rp_lambda = 3
),
# Stress: fat-tailed (t_3), high volatility, deep drawdowns
make_regime(
ret_fn = function(n) rt(n, df = 3L) * 0.012,
vol_mu = 0.022, vol_sd = 0.005,
dd_mu = -0.030, dd_sd = 0.010,
ts_mu = 0.015, ts_sd = 0.005,
rp_a = 5, rp_b = 3, rp_lambda = 12
)
)
X_syn <- as.matrix(X_syn)
y_syn <- factor(
rep(c("Calm", "Steady", "Stress"), each = mk),
levels = c("Calm", "Steady", "Stress")
)
# 80 / 20 random split (see Section 7 for the rationale)
set.seed(7L)
tr_idx <- sample(n, size = floor(0.8 * n))
x_train <- X_syn[ tr_idx, ]
y_train <- y_syn[ tr_idx ]
x_test <- X_syn[-tr_idx, ]
y_test <- y_syn[-tr_idx ]
cat("Training set:", nrow(x_train), "observations\n")
#> Training set: 240 observations
cat("Test set: ", nrow(x_test), "observations\n")
#> Test set: 60 observations
cat("Class balance (train):\n")
#> Class balance (train):
print(table(y_train))
#> y_train
#> Calm Steady Stress
#> 82 75 83model <- student_t_naive_bayes(x_train, y_train)
summary(model)
#>
#> ============================ Student-t Naive Bayes ============================
#>
#> - Call: student_t_naive_bayes(x = x_train, y = y_train)
#> - Samples: 240
#> - Features: 6
#> - nu grid range: 3 to 100
#> - Prior probabilities:
#> - Calm: 0.3417
#> - Steady: 0.3125
#> - Stress: 0.3458
#>
#> -------------------------------------------------------------------------------summary() confirms the call, sample size, feature count,
\(\nu\)-grid range, and prior
probabilities. Because we used a balanced design with equal class counts
the empirical priors are approximately \(1/3\) each.
print(model)
#>
#> ============================ Student-t Naive Bayes ============================
#>
#> Call:
#> student_t_naive_bayes(x = x_train, y = y_train)
#>
#> -------------------------------------------------------------------------------
#>
#> A priori probabilities:
#> Calm Steady Stress
#> 0.3416667 0.3125000 0.3458333
#>
#> -------------------------------------------------------------------------------
#>
#> Tables:
#> $log_return
#> Calm Steady Stress
#> mu 3.768694e-04 4.757325e-05 -1.158370e-03
#> sd 3.091415e-03 4.948078e-03 1.447585e-02
#> nu 3.000000e+01 1.000000e+02 9.000000e+00
#>
#> $rolling_volatility
#> Calm Steady Stress
#> mu 3.954591e-03 7.661608e-03 2.174149e-02
#> sd 9.037302e-04 1.708582e-03 4.649135e-03
#> nu 1.000000e+02 4.000000e+01 1.000000e+02
#>
#> $drawdown
#> Calm Steady Stress
#> mu -0.002126564 -0.008043378 -0.029062709
#> sd 0.002051736 0.003979391 0.010660654
#> nu 100.000000000 27.000000000 100.000000000
#>
#> $transition_stress
#> Calm Steady Stress
#> mu 9.923345e-04 3.106482e-03 1.602354e-02
#> sd 4.205171e-04 1.150587e-03 5.069987e-03
#> nu 1.000000e+02 6.000000e+01 4.000000e+01
#>
#> $residence_pressure
#> Calm Steady Stress
#> mu 0.9445213 2.7786013 11.9450720
#> sd 0.9865471 1.6836134 3.4498315
#> nu 100.0000000 100.0000000 100.0000000
#>
#>
#>
#> # ... and 1 more table
#>
#> -------------------------------------------------------------------------------print() additionally displays the fitted parameter
tables for the first five features (all six here). Each table is a \(3 \times 3\) matrix: rows are \(\mu\), \(\sigma\), and \(\nu\); columns are class levels.
tabs <- tables(model)
print(tabs)
#> $log_return
#> Calm Steady Stress
#> mu 3.768694e-04 4.757325e-05 -1.158370e-03
#> sd 3.091415e-03 4.948078e-03 1.447585e-02
#> nu 3.000000e+01 1.000000e+02 9.000000e+00
#>
#> $rolling_volatility
#> Calm Steady Stress
#> mu 3.954591e-03 7.661608e-03 2.174149e-02
#> sd 9.037302e-04 1.708582e-03 4.649135e-03
#> nu 1.000000e+02 4.000000e+01 1.000000e+02
#>
#> $drawdown
#> Calm Steady Stress
#> mu -0.002126564 -0.008043378 -0.029062709
#> sd 0.002051736 0.003979391 0.010660654
#> nu 100.000000000 27.000000000 100.000000000
#>
#> $transition_stress
#> Calm Steady Stress
#> mu 9.923345e-04 3.106482e-03 1.602354e-02
#> sd 4.205171e-04 1.150587e-03 5.069987e-03
#> nu 1.000000e+02 6.000000e+01 4.000000e+01
#>
#> $residence_pressure
#> Calm Steady Stress
#> mu 0.9445213 2.7786013 11.9450720
#> sd 0.9865471 1.6836134 3.4498315
#> nu 100.0000000 100.0000000 100.0000000
#>
#> $ruin_proxy
#> Calm Steady Stress
#> mu 0.04645572 0.16278727 0.64599386
#> sd 0.05494230 0.11070799 0.15403066
#> nu 6.00000000 15.00000000 100.00000000
#>
#> attr(,"class")
#> [1] "naive_bayes_tables"
#> attr(,"cond_dist")
#> log_return rolling_volatility drawdown transition_stress
#> "Student-t" "Student-t" "Student-t" "Student-t"
#> residence_pressure ruin_proxy
#> "Student-t" "Student-t"To access a single feature by name or index:
# By name
tables(model, which = "log_return")
#> $log_return
#> Calm Steady Stress
#> mu 3.768694e-04 4.757325e-05 -1.158370e-03
#> sd 3.091415e-03 4.948078e-03 1.447585e-02
#> nu 3.000000e+01 1.000000e+02 9.000000e+00
#>
#> attr(,"cond_dist")
#> log_return
#> "Student-t"
# By index (second feature: rolling_volatility)
tables(model, which = 2L)
#> $rolling_volatility
#> Calm Steady Stress
#> mu 3.954591e-03 7.661608e-03 2.174149e-02
#> sd 9.037302e-04 1.708582e-03 4.649135e-03
#> nu 1.000000e+02 4.000000e+01 1.000000e+02
#>
#> attr(,"cond_dist")
#> rolling_volatility
#> "Student-t"coef() returns all parameters in a tidy data frame
suitable for export or further analysis:
cf <- coef(model)
cf
#> Calm:mu Calm:sd Calm:nu Steady:mu Steady:sd
#> log_return 0.0003768694 0.0030914147 30 4.757325e-05 0.004948078
#> rolling_volatility 0.0039545908 0.0009037302 100 7.661608e-03 0.001708582
#> drawdown -0.0021265636 0.0020517360 100 -8.043378e-03 0.003979391
#> transition_stress 0.0009923345 0.0004205171 100 3.106482e-03 0.001150587
#> residence_pressure 0.9445213006 0.9865470622 100 2.778601e+00 1.683613368
#> ruin_proxy 0.0464557214 0.0549422995 6 1.627873e-01 0.110707985
#> Steady:nu Stress:mu Stress:sd Stress:nu
#> log_return 100 -0.00115837 0.014475850 9
#> rolling_volatility 40 0.02174149 0.004649135 100
#> drawdown 27 -0.02906271 0.010660654 100
#> transition_stress 60 0.01602354 0.005069987 40
#> residence_pressure 100 11.94507202 3.449831495 100
#> ruin_proxy 15 0.64599386 0.154030658 100The column convention is <class>:mu,
<class>:sd, <class>:nu for each of
the \(K = 3\) classes, giving \(3K = 9\) columns and \(p = 6\) rows.
The estimated degrees-of-freedom matrix is the most theoretically important output of the model.
nu_mat <- model$params$nu
nu_mat
#> log_return rolling_volatility drawdown transition_stress
#> Calm 30 100 100 100
#> Steady 100 40 27 60
#> Stress 9 100 100 40
#> residence_pressure ruin_proxy
#> Calm 100 6
#> Steady 100 15
#> Stress 100 100Under the generative model, the Stress regime was drawn from \(t_3\): we therefore expect the profile search to select low \(\nu\) values for Stress features.
cat(sprintf(
"log_return: nu(Calm) = %g | nu(Steady) = %g | nu(Stress) = %g\n",
nu_mat["Calm", "log_return"],
nu_mat["Steady", "log_return"],
nu_mat["Stress", "log_return"]
))
#> log_return: nu(Calm) = 30 | nu(Steady) = 100 | nu(Stress) = 9
cat(sprintf(
"drawdown: nu(Calm) = %g | nu(Steady) = %g | nu(Stress) = %g\n",
nu_mat["Calm", "drawdown"],
nu_mat["Steady", "drawdown"],
nu_mat["Stress", "drawdown"]
))
#> drawdown: nu(Calm) = 100 | nu(Steady) = 27 | nu(Stress) = 100Low \(\nu\) for Stress on
log_return and drawdown is the empirical
signature of a fat-tailed crisis distribution. This is not a modelling
assumption imposed a priori; it emerges from the profile grid search
applied to the training data.
In real COR applications fitted on E-mini S&P 500 hourly data,
the Stress regime consistently receives \(\hat\nu \in \{3, 4, 5\}\) on
log_return and drawdown, while the Calm regime
receives \(\hat\nu > 20\) on the
same features.
plot() draws the fitted class-conditional Student-\(t\) densities over the empirical range of
each feature.
plot(model,
prob = "conditional",
arg.num = list(col = c("steelblue", "forestgreen", "firebrick"),
lty = c(1L, 2L, 4L)),
lwd = 2)Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return.
Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return.
Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return.
Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return.
Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return.
Conditional Student-t densities by regime for all six features. Note the heavier tails on the Stress curve for log_return.
prob = "marginal" scales each density by its class
prior, giving a view of the mixture components as they contribute to the
overall marginal distribution:
plot(model,
prob = "marginal",
arg.num = list(col = c("steelblue", "forestgreen", "firebrick"),
lty = c(1L, 2L, 4L)),
lwd = 2)Marginal (prior-weighted) Student-t densities.
Marginal (prior-weighted) Student-t densities.
Marginal (prior-weighted) Student-t densities.
Marginal (prior-weighted) Student-t densities.
Marginal (prior-weighted) Student-t densities.
Marginal (prior-weighted) Student-t densities.
To plot a subset of features, use the which
argument:
plot(model,
which = c("log_return", "drawdown"),
prob = "conditional",
arg.num = list(col = c("steelblue", "forestgreen", "firebrick")),
lwd = 2)Conditional densities for log_return and drawdown only.
Conditional densities for log_return and drawdown only.
cm <- table(Actual = y_test, Predicted = pred_class)
cm
#> Predicted
#> Actual Calm Steady Stress
#> Calm 18 0 0
#> Steady 0 25 0
#> Stress 0 0 17The diagonal elements count correctly classified observations. Off-diagonal elements are misclassifications; for risk management purposes, false-negatives on the Stress class (Stress predicted as Calm) are the most costly.
pred_prob <- predict(model, newdata = x_test, type = "prob")
cat("First 6 test observations — posterior probabilities:\n")
#> First 6 test observations — posterior probabilities:
round(head(pred_prob), 4)
#> Calm Steady Stress
#> [1,] 0.9999 0.0001 0
#> [2,] 0.9982 0.0018 0
#> [3,] 1.0000 0.0000 0
#> [4,] 0.9999 0.0001 0
#> [5,] 0.9989 0.0011 0
#> [6,] 1.0000 0.0000 0Each row is a proper probability distribution: entries are non-negative and sum to one.
In the Clock of Regimes framework, a Stress Alert is triggered whenever the posterior probability \(P(\text{Stress} \mid \mathbf{x}) > 0.60\), signalling that a risk manager should review position sizing or hedge coverage.
stress_prob <- pred_prob[, "Stress"]
alert_flag <- ifelse(stress_prob > 0.60, "Stress Alert", "No Alert")
cat("Alert summary (test period):\n")
#> Alert summary (test period):
print(table(alert_flag))
#> alert_flag
#> No Alert Stress Alert
#> 43 17
cat("\nHighest Stress probabilities in test set:\n")
#>
#> Highest Stress probabilities in test set:
top_stress <- sort(stress_prob, decreasing = TRUE)
round(head(top_stress, 8L), 4)
#> [1] 1 1 1 1 1 1 1 1A Gaussian NBC is a special case of the Student-\(t\) NBC at \(\nu \to \infty\). For light-tailed data the two classifiers perform similarly; for fat-tailed Stress data the Gaussian model suffers classification collapse. This section documents the difference.
We create a stress-concentrated test set: 40 observations drawn from \(t_3\) (the true Stress distribution) alongside 20 each from the Calm and Steady distributions.
set.seed(99L)
n_stress <- 40L
n_calm <- 20L
X_stress_test <- cbind(
log_return = rt(n_stress, df = 3L) * 0.012,
rolling_volatility = abs(rnorm(n_stress, 0.022, 0.005)),
drawdown = rnorm(n_stress, -0.030, 0.010),
transition_stress = abs(rnorm(n_stress, 0.015, 0.005)),
residence_pressure = as.numeric(rpois(n_stress, 12L)),
ruin_proxy = rbeta(n_stress, 5, 3)
)
X_calm_test <- cbind(
log_return = rnorm(n_calm, 0.0002, 0.003),
rolling_volatility = abs(rnorm(n_calm, 0.004, 0.001)),
drawdown = rnorm(n_calm, -0.002, 0.002),
transition_stress = abs(rnorm(n_calm, 0.001, 0.0005)),
residence_pressure = as.numeric(rpois(n_calm, 1L)),
ruin_proxy = rbeta(n_calm, 1, 20)
)
X_mixed_test <- rbind(X_stress_test, X_calm_test)
y_mixed_test <- factor(
c(rep("Stress", n_stress), rep("Calm", n_calm)),
levels = c("Calm", "Steady", "Stress")
)The naivebayes package (Majka 2023) fits a Gaussian NBC when given a numeric matrix. We use the same training data as before.
# Student-t NBC predictions
st_pred <- predict(model, newdata = X_mixed_test, type = "class")
st_acc <- mean(st_pred == y_mixed_test)
# Gaussian NBC predictions
gnb_pred <- predict(gnb, newdata = X_mixed_test, type = "class")
gnb_acc <- mean(gnb_pred == y_mixed_test)
cat(sprintf("Student-t NBC overall accuracy: %.1f%%\n", 100 * st_acc))
#> Student-t NBC overall accuracy: 96.7%
cat(sprintf("Gaussian NBC overall accuracy: %.1f%%\n", 100 * gnb_acc))
#> Gaussian NBC overall accuracy: 96.7%cat("Student-t NBC:\n")
#> Student-t NBC:
table(Actual = y_mixed_test, Predicted = st_pred)
#> Predicted
#> Actual Calm Steady Stress
#> Calm 18 2 0
#> Steady 0 0 0
#> Stress 0 0 40
cat("\nGaussian NBC:\n")
#>
#> Gaussian NBC:
table(Actual = y_mixed_test, Predicted = gnb_pred)
#> Predicted
#> Actual Calm Steady Stress
#> Calm 18 2 0
#> Steady 0 0 0
#> Stress 0 0 40For risk management the critical metric is Stress recall: the fraction of true Stress observations correctly identified as such.
st_stress_recall <- mean(st_pred[y_mixed_test == "Stress"] == "Stress")
gnb_stress_recall <- mean(gnb_pred[y_mixed_test == "Stress"] == "Stress")
cat(sprintf("Student-t NBC Stress recall: %.1f%%\n", 100 * st_stress_recall))
#> Student-t NBC Stress recall: 100.0%
cat(sprintf("Gaussian NBC Stress recall: %.1f%%\n", 100 * gnb_stress_recall))
#> Gaussian NBC Stress recall: 100.0%To illustrate the mechanism, we compare the log-density each model
assigns to a single extreme observation: the 95th percentile of \(|x|\) for log_return in the
Stress training data.
x_extreme <- quantile(
abs(x_train[y_train == "Stress", "log_return"]),
probs = 0.95
)
cat("Extreme log_return value:", round(x_extreme, 5), "\n")
#> Extreme log_return value: 0.02806
# Student-t log-density (Stress class parameters)
mu_st <- model$params$mu["Stress", "log_return"]
sd_st <- model$params$sd["Stress", "log_return"]
nu_st <- model$params$nu["Stress", "log_return"]
ld_st <- stats::dt((x_extreme - mu_st) / sd_st, df = nu_st, log = TRUE) - log(sd_st)
# Gaussian log-density (same location/scale, normal tails)
ld_gn <- stats::dnorm(x_extreme, mean = mu_st, sd = sd_st, log = TRUE)
cat(sprintf("Student-t log-density at extreme point: %+.2f (nu = %g)\n",
ld_st, nu_st))
#> Student-t log-density at extreme point: +1.42 (nu = 9)
cat(sprintf("Gaussian log-density at extreme point: %+.2f\n", ld_gn))
#> Gaussian log-density at extreme point: +1.28
cat(sprintf("Difference (Student-t advantage): %+.2f log-units\n",
ld_st - ld_gn))
#> Difference (Student-t advantage): +0.14 log-unitsThe positive difference shows that the Student-\(t\) model assigns substantially higher (less negative) log-density to the extreme observation. Summed over \(p = 6\) features this gap compounds, shifting the MAP decision toward Stress under the Student-\(t\) model and away from it under the Gaussian model.
The default prior is the empirical class frequency. A domain expert may wish to override this—for example, to reflect the fact that Stress regimes are rare in calendar time (say, 10%) even though they are over-represented in the training corpus.
# Stress is rare in real time: 10% prior weight
model_informed <- student_t_naive_bayes(
x = x_train,
y = y_train,
prior = c(Calm = 0.50, Steady = 0.40, Stress = 0.10)
)
cat("Priors (custom):\n")
#> Priors (custom):
print(model_informed$prior)
#> Calm Steady Stress
#> 0.5 0.4 0.1
# The classifier becomes more conservative about Stress alerts
prob_informed <- predict(model_informed, newdata = x_test, type = "prob")
cat("\nMean posterior P(Stress) — uniform prior: ",
round(mean(pred_prob[, "Stress"]), 4), "\n")
#>
#> Mean posterior P(Stress) — uniform prior: 0.2833
cat("Mean posterior P(Stress) — informed prior: ",
round(mean(prob_informed[, "Stress"]), 4), "\n")
#> Mean posterior P(Stress) — informed prior: 0.2833The default grid c(3:30, 40, 60, 100) is designed to be
broad enough for most applications. Two adjustments are useful:
Restricting to heavy tails only — If the analyst knows that all features in all classes are fat-tailed (e.g., intraday tick data during a crisis), a smaller grid reduces fitting time and prevents the model from selecting spuriously high \(\nu\) on small samples:
model_heavy <- student_t_naive_bayes(
x = x_train,
y = y_train,
nu_grid = c(3, 4, 5, 6, 7, 8, 10)
)
model_heavy$params$nu
#> log_return rolling_volatility drawdown transition_stress
#> Calm 10 10 10 10
#> Steady 10 10 10 10
#> Stress 10 10 10 10
#> residence_pressure ruin_proxy
#> Calm 10 6
#> Steady 10 10
#> Stress 10 10Extending to near-Gaussian — Conversely, if the data contain a benign asset class where Gaussian behaviour is plausible even in crisis, adding \(\nu = 200\) or \(\nu = 500\) gives the grid search a near-Gaussian option:
All fitted parameters are stored in $params as named
\(K \times p\) matrices. This makes it
easy to derive secondary quantities programmatically.
# Regime-mean volatility (mu of rolling_volatility by class)
model$params$mu[, "rolling_volatility"]
#> Calm Steady Stress
#> 0.003954591 0.007661608 0.021741486
# Tail-thickness ranking across regimes for each feature
apply(model$params$nu, 2, which.min) # index of class with lowest nu
#> log_return rolling_volatility drawdown transition_stress
#> 3 2 2 3
#> residence_pressure ruin_proxy
#> 1 1predict() uses only the intersection of
colnames(newdata) and the training features, so prediction
on a reduced feature set works without modification:
# Predict using only the two most important risk features
x_reduced <- x_test[, c("log_return", "drawdown")]
pred_reduced <- predict(model, newdata = x_reduced, type = "class")
cat("Accuracy on full feature set: ",
round(mean(pred_class == y_test), 4), "\n")
#> Accuracy on full feature set: 1
cat("Accuracy on 2-feature subset: ",
round(mean(pred_reduced == y_test), 4), "\n")
#> Accuracy on 2-feature subset: 0.8833kronxNBC was designed as the classification component of the Clock of Regimes (COR) framework for intraday equity futures. This section describes the empirical pipeline so that users can adapt it to their own data.
The COR pipeline requires two inputs:
timestamp, close, and ret
(log-return) at hourly frequency.state column produced by a Hidden Markov Model fitted to
the same series (Hamilton 1989).Six features are computed over a rolling 24-hour window. Each captures a distinct aspect of market microstructure.
library(zoo)
es_data <- read.csv("data.csv", stringsAsFactors = FALSE)
decoded <- read.csv("decoded_states.csv", stringsAsFactors = FALSE)
es_data <- es_data[!is.na(es_data$ret), ]
n_roll <- 24L
# 1. Rolling volatility (SD of log-returns)
rolling_vol <- rollapply(es_data$ret, n_roll, sd, fill = NA, align = "right")
rolling_vol <- pmax(rolling_vol, 0.0001)
# 2. Drawdown from rolling peak
rolling_max <- rollapply(es_data$close, n_roll, max, fill = NA, align = "right")
drawdown <- (es_data$close - rolling_max) / rolling_max
# 3. Downside semi-deviation (left-tail sensitivity)
downside_dev <- function(x) {
neg_x <- x[x < 0]
if (length(neg_x) == 0L) return(0)
sqrt(mean(neg_x^2))
}
trans_stress <- rollapply(es_data$ret, n_roll, downside_dev, fill = NA, align = "right")
# 4. Residence pressure (consecutive hours in drawdown)
is_dd <- as.integer(drawdown < -0.005)
is_dd[is.na(is_dd)] <- 0L
res_pr <- ave(is_dd, cumsum(is_dd == 0L), FUN = cumsum)
# 5. Ruin proxy: P(return <= -2%) under rolling distribution
rolling_mean <- rollapply(es_data$ret, n_roll, mean, fill = NA, align = "right")
ruin_proxy <- pnorm(-0.02, mean = rolling_mean, sd = rolling_vol)
# Assemble and attach HMM labels
state_labels <- c("1" = "Stress", "2" = "Calm", "3" = "Steady")
cor_data <- data.frame(
timestamp = es_data$timestamp,
log_return = es_data$ret,
rolling_volatility = rolling_vol,
drawdown = drawdown,
transition_stress = trans_stress,
residence_pressure = pmax(res_pr, 0.0001),
ruin_proxy = pmax(ruin_proxy, 0.0001),
regime = factor(
state_labels[as.character(decoded$state)],
levels = c("Calm", "Steady", "Stress")
)
)
cor_data <- cor_data[complete.cases(cor_data), ]Financial regime data exhibit strong temporal clustering: a Stress episode may span 48–200 consecutive hours. A chronological 80/20 split therefore risks placing an entire Stress cluster exclusively in the test set, leaving the training corpus with zero Stress observations and guaranteeing classification collapse.
A random 80/20 split breaks the temporal adjacency of observations, ensuring every regime class is represented in both partitions regardless of when in calendar time each episode occurred.
Caveat: random sampling allows observations from the same cluster to appear in both train and test sets, introducing distributional leakage across the split boundary. For a production backtesting framework a purged, embargo-based cross-validation scheme (e.g.,
mlr3+PurgedCV) is preferred. For the diagnostic regime classifier, the random split is the correct choice because the primary concern is class-balance coverage, not look-ahead bias.
features <- c("log_return", "rolling_volatility", "drawdown",
"transition_stress", "residence_pressure", "ruin_proxy")
set.seed(123L)
tr_idx <- sample(nrow(cor_data), floor(0.80 * nrow(cor_data)))
x_train <- as.matrix(cor_data[ tr_idx, features])
y_train <- cor_data$regime[ tr_idx]
x_test <- as.matrix(cor_data[-tr_idx, features])
y_test <- cor_data$regime[-tr_idx]
model <- student_t_naive_bayes(x_train, y_train)On real COR data fitted to hourly E-mini S&P 500 returns (2015–2023), the profile grid search consistently finds:
| Feature | Calm \(\hat\nu\) | Steady \(\hat\nu\) | Stress \(\hat\nu\) |
|---|---|---|---|
log_return |
20–30 | 15–25 | 3–6 |
drawdown |
15–25 | 12–20 | 3–8 |
rolling_volatility |
20–30 | 15–25 | 10–20 |
ruin_proxy |
25–30 | 20–28 | 4–10 |
The consistently low \(\hat\nu\) for
Stress on log_return, drawdown, and
ruin_proxy is an empirical validation of the fat-tail
hypothesis: crisis returns are not merely high-variance Gaussian draws
but draws from a genuinely different, heavy-tailed distribution.
kronxNBC provides a robust alternative to Gaussian Naive Bayes for classification problems where some class-conditional distributions exhibit heavy tails. The key contributions are:
Student-\(t\) class-conditional likelihood: naturally accommodates fat-tailed returns without the tail truncation implicit in Gaussian models.
Profile grid search for \(\nu\): replaces numerically fragile continuous optimisation with a fast, stable discrete search. The IRLS update step at each grid point provides a good initialisation at minimal computational cost.
Per-(class, feature) \(\nu\) selection: allows the model to be simultaneously fat-tailed for Stress features and near-Gaussian for Calm features—a flexibility that a single global \(\nu\) cannot provide.
Log-density floor: prevents classification collapse when a single feature produces an extreme log-density, complementing the heavy-tail robustness of the Student-\(t\) kernel.
naivebayes compatibility: the
tables() generic extension means
student_t_naive_bayes objects integrate seamlessly with
existing naivebayes workflows.
The package is available on CRAN as kronxNBC and requires R >= 4.0.0.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Sequoia 15.7.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Riga
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] naivebayes_1.0.0 kronxNBC_0.1.1
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.57
#> [5] cachem_1.1.0 knitr_1.51 htmltools_0.5.9 rmarkdown_2.31
#> [9] lifecycle_1.0.5 cli_3.6.6 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.6.0 tools_4.6.0 evaluate_1.0.5 bslib_0.11.0
#> [17] yaml_2.3.12 otel_0.2.0 rlang_1.2.0 jsonlite_2.0.0