Most time-series anomaly detection models don't need labels for training. So why should we need labels to select good models?
TL;DR: We introduce `tsadams` for unsupervised time-series anomaly detection model selection!
Hundreds of models for anomaly detection in time-series are available to practitioners, but no method exists to select the best model and its hyperparameters for a given dataset when labels are not available. We construct three classes of surrogate metrics which we show to be correlated with common supervised anomaly detection accuracy metrics such as the F1 score. The three classes of metrics are prediction accuracy, centrality, and performance on injected synthetic anomalies. We show that some of the surrogate metrics are useful for unsupervised model selection but not sufficient by themselves. To this end, we treat metric combinations as a rank aggregation problem and propose a robust rank aggregation approach. Large scale experiments on multiple real-world datasets demonstrate that our proposed unsupervised aggregation approach is as effective as selecting the best model based on collecting anomaly labels.
Figure 1: The Model Selection Workflow. We identify three classes of surrogate metrics of model quality, and propose a novel robust rank aggregation framework to combine multiple rankings from metrics.
If you use this code, please consider citing our work:
Unsupervised Model Selection for Time-series Anomaly Detection
Mononito Goswami, Cristian Ignacio Challu, Laurent Callot, Lenon Minorics, Andrey Kan
International Conference on Learning Representations (ICLR), 2023
We carry out experiments on two popular and widely used real-world collections with diverse time-series and anomalies: (1) UCR Anomaly Archive (UCR) (Wu & Keogh, 2021), and (2) Server Machine Dataset (SMD) (Su et al., 2019).
These datasets can be downloaded using the download_data.py
script in the scripts
directory and loaded using the tsadams.datasets.load.load_data(...)
function.
To load the UCR dataset:
from tsadams.datasets.load import load_data
# Load the data
ENTITY = 'anomaly_archive' # 'anomaly_archive' OR 'smd'
DATASET = '028_UCR_Anomaly_DISTORTEDInternalBleeding17' # Name of timeseries in UCR or machine in SMD
train_data = load_data(dataset=DATASET,
group='train',
entities=[ENTITY],
downsampling=None,
root_dir='/path_to_dataset_dir',
normalize=True,
verbose=True)
test_data = load_data(dataset=DATASET,
group='test',
entities=[ENTITY],
downsampling=None,
root_dir='/path_to_dataset_dir',
normalize=True,
verbose=True)
We recommend installing Ananconda to run our code. To install Anaconda, review the installation instructions here.
To setup the environment using conda
(recommended, but optional), run the following commands:
# To create environment from environment_explicit.yml file
foo@bar:~$ conda env create -f environment_explicit.yml
# To activate the environment
foo@bar:~$ conda activate modelselect
# To verify if the new environment was installed correctly
foo@bar:~$ conda env list
For an editable installation of our code from source, run the following commands:
foo@bar:~$ git clone https://github.com/mononitogoswami/tsad-model-selection.git
foo@bar:~$ cd tsad-model-selection/src/
foo@bar:~$ pip install -e .
To reproduce the results presented in the paper, please follow these steps in the specified order. You can find all the necessary scripts in the src > scripts
directory of this repository:
- Run
download_data.py
to download the Server Machine datasets and the UCR Anomaly Detection archive. - Train multiple anomaly detection models for each dataset using the
train_all_models.py
. You can track the progress of trained models using thecheck_number_of_trained_models.py
. After this stage, for each dataset in SMD and the UCR anomaly archive, we should have trained anomaly detection models. Please note that, in some cases, certain models may not have completed training due to potential errors. - Next, get predictions (i.e. use each model to reconstruct time-series in each dataset) for all models and datasets using the
evaluate_all_models.py
. The progress for this step can be tracked usingcheck_number_of_evaluated_models.py
. - In the paper, we pool the predictions of a particular model on multiple related datasets. This gives us a more robust measure of performance. To pool predictions of multiple models, run the compute_pooled_results.py`. With this we should be all set to perform model selection!
- Perform model selection for each pooled datasets and evaluate it using the
results.ipynb
notebook.
If you use our code please cite our paper:
@article{
goswami2023unsupervised,
title={Unsupervised Model Selection for Time-series Anomaly Detection},
author={Goswami, Mononito and Challu, Cristian and Callot, Laurent and Minorics, Lenon and Kan, Andrey},
journal={International Conference on Learning Representations.},
year={2023},
}