Sparse Gaussian MRF Mixtures for Anomaly Detection

Koji Makiyama (@hoxo_m)

2018-04-16

1 Overview

The sGMRFmix package provides an implementation of the algorithm presented by Ide et al. (2016). It is a novel anomaly detection method for multivariate noisy sensor data. It can automatically handle multiple operational modes. And it can also compute variable-wise anomaly scores.

First, we prepare a multivariate training data that contains no anomaly observations. Here we generate a data that consists of four parts and repeats two operational modes.

library(sGMRFmix)
set.seed(314)
train_data <- generate_train_data()
plot_multivariate_data(train_data)

Second, we prepare a multivariate test data that contains some anomaly values. Here we generate a data that consists of 500 normal observations and 500 anomaly observations. The normal part in the test data consists of two operational modes that also have seen in the training data. Note that the variable x5 has no difference in the normal and anomaly part.

test_data <- generate_test_data()
plot_multivariate_data(test_data)

We input the training data to the sGMRFmix() function to learn the two operational modes. Then we compute variable-wise anomaly scores for test data using the result.

fit <- sGMRFmix(train_data, K = 7, rho = 0.8, verbose = FALSE)
anomaly_score <- compute_anomaly_score(fit, test_data)
plot_multivariate_data(anomaly_score, fix_scale = TRUE) + ylim(NA, 50)

You can see that high anomaly scores appear in the latter part of the test data. And you can also see the variable x5 has no high anomaly scores.

2 Installation

You can install the sGMRFmix package from CRAN.

install.packages("sGMRFmix")

You can also install the package from GitHub.

install.packages("devtools") # if you have not installed "devtools" package
devtools::install_github("hoxo-m/sGMRFmix")

The source code for sGMRFmix package is available on GitHub at

https://github.com/hoxo-m/sGMRFmix.

3 Details

3.1 Basics

The sGMRFmix package mainly provides two functions as follows:

There are two hyperparameters as below.

You only need to set K a large enough number because the algorithm identifies major dependency patterns from the data via the sparse mixture model.

On the other hand, you should determine rho an optimal value to maximize the performance of anomaly detection.

3.2 Data

To fit the model, you must prepare two kinds of data as follows:

The package provides several functions to generate synthetic data.

Also, the package provides a function to visualize these data.

3.3 Fitting Model

The package provides a function sGMRFmix() to fit the model named Sparse Gaussian Markov Random Field Mixtures (Ide et al., 2016). It can automatically handle multiple operational modes and allows to compute variable-wise anomaly scores.

#> 
#> Call:
#> sGMRFmix(x = train_data, K = 7, rho = 0.8, verbose = FALSE)
#> 
#> Data: 1000 x 5 
#> Parameters:
#>   K:    7 
#>   rho:  0.8 
#> Estimated:
#>   Kest: 2 
#>   pi:   0.502 0.498 
#>   m, A, theta, H, mode

As the result of we set K to 7 and fit the model, the number of mixtures Kest = 2 has been obtained by the sparse estimation. The estimated weights of the mixtures are displayed as pi. They are near 0.5. The result contains other estimated parameters such as m, A, theta.

3.4 Tuning Hyperparameter rho

You can fit the model without labeled data, but you should prepare a labeled test data to tell the model what the anomalies are. To do it, you have no choice but to tune the hyperparameter rho. To avoid overfitting, you can use cross-validation. We measure the performance of the anomaly detection by ROC-AUC.

Optimal rho value that has the best performance to detect the anomalies is 0.774.

3.5 Anomaly Detection

We have obtained optimal hyperparameter rho. Next, let us investigate optimal threshold value for anomaly scores. We measure the performance of the anomaly detection by F-measure.

We found optimal threshold is 1.71.

We can use the value for anomaly detection.

3.6 Moving Average of Anomaly Scores

In the above example, you might think the F-measure is low. There is another way if you are interested in whether anomalies occur densely rather than individual one.

You can calculate moving average (or called rolling mean) for the anomaly scores by passing window size.

You can see that we obtained an anomaly detector with high performance.

3.7 Structure of Operational Modes

In the above, the model has identified that the synthetic data consists of two operational modes. We can see it as follows:

We can also see the weight indicating how often these modes appear.

Furthermore, we can also see the structure of each mode. fit$mode is the assigned mode to each observation in the training data.

Using it, you can see the correlation structure for each mode.

In reality, true structure are unknown. You should check estimated structure and consider whether it is reasonable.

You can also see the structure of the anomaly state. To compare it with the normal operating modes will be helpful to investigate what caused the anomalies.

Reference