BayesChange Tutorial

Here we provide a brief tutorial of the BayesChange package. The BayesChange package contains two main functions: one that performs change points detection on time series and epidemic diffusions and one that perform clustering of time series and epidemic diffusions with common change points. Here we briefly show how to implement these.

Detecting change points

The function detect_cp provide a method for detecting change points, it is based on the work Martínez and Mena (2014) and on Corradin et al. (2024)

Depending on the structure of the data, detect_cp might perform change points detection on univariate time series or multivariate time series. We import dataset eu_inflation that contains the standardized monthly inflation rates from the Harmonized Index of Consumer Prices (HICP) for the 12 COICOP expenditure categories across European Union countries. The data span the period from February~1997 to December~2024 resulting in a matrix of \(12\) rows and \(355\) columns.

data("eu_inflation")

Now we can run the function detect_cp, as arguments of the function we need to specify the number of iterations, the number of burn-in steps and a list with the the autoregressive coefficient phi for the likelihood of the data, the parameters a, b, c for the priors and the probability q of performing a split at each step. Since we deal with time series we need also to specify kernel = "ts".

out <- detect_cp(data = eu_inflation[1,],
                 n_iterations = 2000, n_burnin = 500, q = 0.5,
                 params = list(prior_var_phi = 0.1, prior_delta_c = 1, prior_delta_d = 1), kernel = "ts")
#> Completed:   200/2000 - in 0.056375 sec
#> Completed:   400/2000 - in 0.108558 sec
#> Completed:   600/2000 - in 0.160803 sec
#> Completed:   800/2000 - in 0.216577 sec
#> Completed:   1000/2000 - in 0.267866 sec
#> Completed:   1200/2000 - in 0.317435 sec
#> Completed:   1400/2000 - in 0.367638 sec
#> Completed:   1600/2000 - in 0.416679 sec
#> Completed:   1800/2000 - in 0.469027 sec
#> Completed:   2000/2000 - in 0.519497 sec

With the methods print and summary we can get information about the algorithm.

print(out)
#> DetectCpObj object
#> Type: change points detection on univariate time series
summary(out)
#> DetectCpObj object
#> Change point detection summary:
#>  - Data: univariate time series
#>  - Burn-in iterations: 500 
#>  - MCMC iterations: 1500 
#>  - Average number of detected change points: 8.9 
#>  - Computational time: 0.52 seconds
#>  
#> Use plot() for a detailed visualization or posterior_estimate() to analyze the detected change points.

In order to get a point estimate of the change points we can use the method posterior_estimate that uses the method salso by David B. Dahl and Müller (2022) to get the final latent order and then detect the change points.

cp_est <- posterior_estimate(out, loss = "binder")
cumsum(table(cp_est))[-length(table(cp_est))] + 1
#>   1   2   3   4   5   6   7   8 
#>  49  64 130 148 301 307 319 325

The package also provides a method for plotting the change points.

plot(out, loss = "binder")

We can assess convergence of the latent order posterior chain, for example, by inspecting the traceplot of its log-likelihood with coda::traceplot.

coda::traceplot(out$lkl_MCMC, ylab = "Log-Likelihood")

If we instead consider a matrix of data, detect_cp automatically performs a multivariate change points detection method. We define the parameters.

params_multi <- list(m_0 = rep(0,3),
                     k_0 = 1,
                     nu_0 = 10,
                     S_0 = diag(0.1,3,3),
                     prior_var_phi = 0.1,
                     prior_delta_c = 1,
                     prior_delta_d = 1)

Arguments k_0, nu_0, phi_0, m_0, prior_delta_c, prior_delta_d and prior_var_phi correspond to the parameters of the prior distributions for the multivariate likelihood.

out <- detect_cp(data = eu_inflation[1:3,], n_iterations = 2000,
          n_burnin = 500, q = 0.5, params = params_multi, kernel = "ts")
#> Completed:   200/2000 - in 0.03948 sec
#> Completed:   400/2000 - in 0.078843 sec
#> Completed:   600/2000 - in 0.119545 sec
#> Completed:   800/2000 - in 0.160387 sec
#> Completed:   1000/2000 - in 0.202748 sec
#> Completed:   1200/2000 - in 0.245142 sec
#> Completed:   1400/2000 - in 0.287592 sec
#> Completed:   1600/2000 - in 0.33003 sec
#> Completed:   1800/2000 - in 0.372545 sec
#> Completed:   2000/2000 - in 0.415026 sec

table(posterior_estimate(out, loss = "binder"))
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.
#> 
#>   1   2   3   4   5 
#>  42 134 102  24  34

plot(out, loss = "binder", plot_freq = TRUE)
#> Warning in salso::salso(mcmc_chain, loss = "binder", maxNClusters =
#> maxNClusters, : The number of clusters equals the default maximum possible
#> number of clusters.

Function detect_cp can also be used to detect change points on survival functions. We consider the synthetic dataset epi_synthetic

data("epi_synthetic")

To run detect_cp on epidemiological data we need to set kernel = "epi". Moreover, besides the usual parameters, we need to set the number of Monte Carlo replications M for the approximation of the integrated likelihood and the recovery rate xi. a0 and b0 are optional and correspond to the parameters of the gamma distribution for the integration of the likelihood.

params_epi <- list(M = 250, xi = 1/8, a0 = 4, b0 = 10, I0_var = 0.1)

out <- detect_cp(data = epi_synthetic, n_iterations = 2000, n_burnin = 500,
                 q = 0.25, params = params_epi, kernel = "epi")
#> Completed:   200/2000 - in 1.90273 sec
#> Completed:   400/2000 - in 3.70433 sec
#> Completed:   600/2000 - in 5.50799 sec
#> Completed:   800/2000 - in 7.29868 sec
#> Completed:   1000/2000 - in 9.09697 sec
#> Completed:   1200/2000 - in 10.8842 sec
#> Completed:   1400/2000 - in 12.7115 sec
#> Completed:   1600/2000 - in 14.5795 sec
#> Completed:   1800/2000 - in 16.3786 sec
#> Completed:   2000/2000 - in 18.163 sec

print(out)
#> DetectCpObj object
#> Type: change points detection on an epidemic diffusion

Also here, with function plot we can plot the survival function and the position of the change points.

plot(out)

Clustering time dependent data with common change points

BayesChange contains another function, clust_cp, that cluster respectively univariate and multivariate time series and survival functions with common change points. Details about this methods can be found in Corradin et al. (2026)

In clust_cp the argument kernel must be specified, if data are time series then kernel = "ts" must be set. Then the algorithm automatically detects if data are univariate or multivariate.

We consider for this example dataset stock_uni that contains the daily mean stock prices for the 50 largest companies (by market capitalization) in the Standard&Poor’s 500 Index from January 1, 2020 to January 1, 2022.

data("stock_uni")

Arguments that need to be specified in clust_cp are the number of iterations n_iterations, the number of elements in the normalisation constant B, the split-and-merge step L performed when a new partition is proposed and a list with the parameters of the algorithm, the likelihood and the priors..

params_uni <- list(a = 1,
                   b = 1,
                   c = 1,
                   phi = 0.1)

out <- clust_cp(data = stock_uni[1:5,], n_iterations = 2000, n_burnin = 500,
                L = 1, q = 0.5, B = 1000, params = params_uni, kernel = "ts")
#> Normalization constant - completed:  100/1000 - in 0.049852 sec
#> Normalization constant - completed:  200/1000 - in 0.100371 sec
#> Normalization constant - completed:  300/1000 - in 0.15172 sec
#> Normalization constant - completed:  400/1000 - in 0.195965 sec
#> Normalization constant - completed:  500/1000 - in 0.24171 sec
#> Normalization constant - completed:  600/1000 - in 0.289233 sec
#> Normalization constant - completed:  700/1000 - in 0.338294 sec
#> Normalization constant - completed:  800/1000 - in 0.391026 sec
#> Normalization constant - completed:  900/1000 - in 0.440859 sec
#> Normalization constant - completed:  1000/1000 - in 0.500346 sec
#> 
#> ------ MAIN LOOP ------
#> 
#> Completed:   200/2000 - in 0.84642 sec
#> Completed:   400/2000 - in 1.56314 sec
#> Completed:   600/2000 - in 2.2361 sec
#> Completed:   800/2000 - in 2.80847 sec
#> Completed:   1000/2000 - in 3.29048 sec
#> Completed:   1200/2000 - in 3.79205 sec
#> Completed:   1400/2000 - in 4.49684 sec
#> Completed:   1600/2000 - in 5.10813 sec
#> Completed:   1800/2000 - in 5.73338 sec
#> Completed:   2000/2000 - in 6.35254 sec

posterior_estimate(out, loss = "binder")
#> [1] 1 2 3 1 1

Method plot for clustering univariate time series represents the data colored according to the assigned cluster.

plot(out, loss = "binder")

Method plot_psm shows the posterior similarity matrix of the clustering. Selecting reorder = TRUE we can choose to order the matrix depending on the clustering obtained.

plot_psm(out, reorder = TRUE)

If time series are multivariate, data must be an array, where each element is a multivariate time series represented by a matrix. Each row of the matrix is a component of the time series. Here we use dataset stock_multi that contains for each company the daily opening and closing stock prices.

data("stock_multi")

params_multi <- list(m_0 = rep(0,2),
                     k_0 = 1,
                     nu_0 = 10,
                     S_0 = diag(1,2,2),
                     phi = 0.1)

out <- clust_cp(data = stock_multi[,,1:5], n_iterations = 2500, n_burnin = 500,
                L = 1, B = 1000, params = params_multi, kernel = "ts")
#> Normalization constant - completed:  100/1000 - in 0.012113 sec
#> Normalization constant - completed:  200/1000 - in 0.02422 sec
#> Normalization constant - completed:  300/1000 - in 0.036333 sec
#> Normalization constant - completed:  400/1000 - in 0.048294 sec
#> Normalization constant - completed:  500/1000 - in 0.060272 sec
#> Normalization constant - completed:  600/1000 - in 0.072271 sec
#> Normalization constant - completed:  700/1000 - in 0.084283 sec
#> Normalization constant - completed:  800/1000 - in 0.096163 sec
#> Normalization constant - completed:  900/1000 - in 0.108171 sec
#> Normalization constant - completed:  1000/1000 - in 0.120137 sec
#> 
#> ------ MAIN LOOP ------
#> 
#> Completed:   250/2500 - in 0.292949 sec
#> Completed:   500/2500 - in 0.59638 sec
#> Completed:   750/2500 - in 0.90365 sec
#> Completed:   1000/2500 - in 1.21082 sec
#> Completed:   1250/2500 - in 1.52256 sec
#> Completed:   1500/2500 - in 1.8484 sec
#> Completed:   1750/2500 - in 2.15655 sec
#> Completed:   2000/2500 - in 2.46296 sec
#> Completed:   2250/2500 - in 2.76752 sec
#> Completed:   2500/2500 - in 3.08023 sec

posterior_estimate(out, loss = "binder")
#> [1] 1 2 3 1 1

plot(out, loss = "binder")

Finally, if we set kernel = "epi", clust_cp cluster survival functions with common change points. Also here details can be found in Corradin et al. (2026)

Data are a matrix where each row is the number of infected at each time. Inside this package is included the dataset epi_synthetic_multi with multivariate synthetic epidemic diffusions.

data("epi_synthetic_multi")

params_epi <- list(M = 100, xi = 1/8,
                   alpha_SM = 1,
                   a0 = 4,
                   b0 = 10,
                   I0_var = 0.1,
                   avg_blk = 2)

out <- clust_cp(epi_synthetic_multi[,10:150], n_iterations = 2000, n_burnin = 500,
                L = 1, B = 1000, params = params_epi, kernel = "epi")

posterior_estimate(out, loss = "binder")
plot(out, loss = "binder")

Corradin, Riccardo, Luca Danese, Wasiur R. KhudaBukhsh, and Andrea Ongaro. 2024. “Model-Based Clustering of Time-Dependent Observations with Common Structural Changes.” https://arxiv.org/abs/2410.09552.

———. 2026. “Model-Based Clustering of Time-Dependent Observations with Common Structural Changes.” Statistics and Computing 36 (1): 7. https://doi.org/10.1007/s11222-025-10756-x.

David B. Dahl, Devin J. Johnson, and Peter Müller. 2022. “Search Algorithms and Loss Functions for Bayesian Clustering.” Journal of Computational and Graphical Statistics 31 (4): 1189–1201. https://doi.org/10.1080/10618600.2022.2069779.

Martínez, Asael Fabian, and Ramsés H. Mena. 2014. “On a Nonparametric Change Point Detection Model in Markovian Regimes.” Bayesian Analysis 9 (4): 823–58. https://doi.org/10.1214/14-BA878.