Official PyTorch implementation for Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction (NPSR).
A major difficulty for time series anomaly detection arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose NPSR, an algorithm that utilizes point-based and sequence-based reconstruction models. The point-based model quantifies point anomalies, and the sequence-based model quantifies both point and contextual anomalies. We formulate the observed time point
Under this formulation, we link the reconstruction errors with the deviations (anomalies) and introduce a nominality score
Figure 1. (a) Performer-based autoencoder
We evaluate the performance of NPSR against 14 baselines over 7 datasets using the best F1 score (
Note: Due to reliability concerns, we didn't use the point-adjusted best F1 score (
Table 1. Best F1 score (
Installation (to install pytorch cf. https://pytorch.org/get-started/locally/):
conda create -n npsr python=3.11
conda activate npsr
pip install torch torchvision torchaudio
pip install -r requirements.txt
config.txt
contains all the settings. See Appendix C in the paper for the parameter settings.
usage:
python main.py config.txt
The algorithm will do an evaluation every epoch
Figure 1. A screenshot of an evaluation step for the test data of trimSyn.
After training/testing, it is possible to use parse_results.ipynb
to visualize the training/testing results.
Dataset folders should be put under ./datasets/DATASET
After downloading and putting the files/dir in the correct folder (folder name should match dataset name), execute: python make_pk.py
.
A file named DATASET.pk
will appear in the same folder.
The main program will import preprocess_DATASET.py
and load DATASET.pk
for preprocessing.
You can get the SWaT and WADI dataset by filling out the form at: https://docs.google.com/forms/d/1GOLYXa7TX0KlayqugUOOPMvbcwSQiGNMOjHuNqKcieA/viewform?edit_requested=true
This work uses the data from SWaT.A1 & A2_Dec 2015
.
Files SWaT_Dataset_Attack_v0.csv
and SWaT_Dataset_Normal_v1.csv
should be in the same directory as make_pk.py
. (please convert them from .xlsx files first)
You can get the SWaT and WADI dataset by filling out the form at: https://docs.google.com/forms/d/1GOLYXa7TX0KlayqugUOOPMvbcwSQiGNMOjHuNqKcieA/viewform?edit_requested=true
This work uses the 2017 year data.
Files WADI_14days.csv
and WADI_attackdata.csv
should be in the same directory as make_pk.py
.
Dataset downloadable at: https://github.com/eBay/RANSynCoders/tree/main/data
Files train.csv
, test.csv
, and test_label.csv
should be in the same directory as make_pk.py
.
You can get the MSL and SMAP datasets using:
wget https://s3-us-west-2.amazonaws.com/telemanom/data.zip
unzip data.zip
rm data.zip
cd data
wget https://raw.githubusercontent.com/khundman/telemanom/master/labeled_anomalies.csv
Folders train
, test
, both containing the .npy files for each entity, and file labeled_anomalies.csv
should be in the same directory as make_pk.py
.
Dataset downloadable at: https://github.com/NetManAIOps/OmniAnomaly/tree/master/ServerMachineDataset
Folders train
, test
, and test_label
, all containing .txt files for each entity, should be in the same directory as make_pk.py
.
Dataset downloadable at: https://drive.google.com/drive/folders/1y5nIA5ame0RvNAuRmnA5ScW8PL1LP-Oq
Files MSCRED.csv
and MSCRED_GT.csv
should be in the same directory as make_pk.py
.
This is the univariate Mackey-Glass Anomaly Benchmark dataset from https://github.com/MarkusThill/MGAB
Please clone the github repository into the same directory as make_pk.py
.
- Edit
utils/datasets.py
and insert some code in the functionget_dataset_processed
like this:
elif params.name == 'DATASET':
data_path = 'datasets/DATASET/'
if data_path not in sys.path:
sys.path.append(data_path)
from preprocess_DATASET import DATASET_Dataset
dataset = DATASET_Dataset(dataset_pth = data_path + params.name + '.pk') # single entity dataset
dataset = DATASET_Dataset(dataset_pth = data_path + params.name + '.pk', entities = params.entities) # multi entity dataset
The program will try to import DATASET_Dataset
from datasets/DATASET/preprocess_DATASET.py
.
-
Construct the file
datasets/DATASET/preprocess_DATASET.py
and define certain properties (e.g. dims, num_entity, other dataset specific preprocessing). A single entity example (SWaT) can be found atdatasets/SWaT/preprocess_SWaT.py
; and a multi entity example (MSL) can be found atdatasets/MSL/preprocess_MSL.py
. -
Construct
DATASET.pk
. As we can see in the above code block, the program looks for the .pk filedatasets/DATASET/DATASET.pk
.
Each make_pk.py
script that we have provided outputs a .pk file that contains a Python Dictionary, say we name it dat
. dat
contains three key/value pairs: dat['x_trn'], dat['x_tst'], dat['lab_tst']
. For single entity datasets, dat['x_trn']
and dat['x_tst']
are 2D numpy arrays having the shape (time points, dims). dat['lab_tst']
is a 1D numpy array having the shape (time points,). For multi entity datasets, all of them are lists that contain multiple numpy arrays. For each entity, the structure is the same as a single entity dataset.
You do not need a make_pk.py
, but should have a corresponding DATASET.pk
that can be loaded by preprocess_DATASET.py
.
If you find this repo useful, please cite our paper. This citation might be updated after NeurIPS 2023 conference.
@article{lai2023nominality,
title={Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction},
author={Lai, Chih-Yu and Sun, Fan-Keng and Gao, Zhengqi and Lang, Jeffrey H and Boning, Duane S},
journal={arXiv preprint arXiv:2310.15416},
year={2023}
}
Email: [email protected]
Andrew Lai (Chih-Yu Lai), Ph. D. Student, https://chihyulai.com/
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology