Outlier Detection (also known as Anomaly Detection) is a fascinating and useful technique to identify outlying data objects. It has been proven critical in many fields, such as credit card fraud analytics and mechanical unit defect detection.
In this repository, you could find:
- Books & Academic Papers
- Learning Materials, e.g., online courses and videos
- Outlier Datasets
- Open-source Libraries & Demo Codes
- Paper Downloader: a Python 3 script to download the open access papers listed in this repository (under development).
I would continue adding more items to the repository. Please feel free to suggest some critical materials by opening an issue, submitting a pull request, or dropping me an email @ ([email protected]). Enjoy reading!
- 1. Books & Tutorials
- 2. Courses/Seminars/Videos
- 3. Toolbox & Datasets
- 4. Papers
- 4.1. Overview & Survey Papers
- 4.2. Key Algorithms
- 4.3. Graph & Network Outlier Detection
- 4.4. Time Series Outlier Detection
- 4.5. Feature Selection in Outlier Detection
- 4.6. High-dimensional & Subspace Outliers
- 4.7. Outlier Ensembles
- 4.8. Outlier Detection in Evolving Data
- 4.9. Representation Learning in Outlier Detection
- 4.10. Interpretability
- 4.11. Social Media Anomaly Detection
- 4.12. Outlier Detection in Other fields
- 4.13. Outlier Detection Applications
- 5. Key Conferences/Workshops/Journals
Outlier Analysis by Charu Aggarwal: Classical text book covering most of the outlier analysis techniques. A must-read for people in the field of outlier detection. [Preview.pdf]
Outlier Ensembles: An Introduction by Charu Aggarwal and Saket Sathe: Great intro book for ensemble learning in outlier analysis.
Data Mining: Concepts and Techniques (3rd) by Jiawei Han Micheline Kamber Jian Pei: Chapter 12 discusses outlier detection with many fundamental points. [Google Search]
Tutorial Title | Year | Ref | Materials |
---|---|---|---|
Outlier detection techniques (2010 ACM SIGKDD) | 2010 | [11] | [PDF] |
Anomaly Detection: A Tutorial (ICDM 2011) | 2011 | [8] | [PDF] |
Data mining for anomaly detection (PKDD 2008) | 2008 | [12] | [See Video] |
Coursera Introduction to Anomaly Detection (by IBM): https://www.coursera.org/learn/ai/lecture/ASPv0/introduction-to-anomaly-detection
Coursera Real-Time Cyber Threat Detection and Mitigation partly covers the topic: https://www.coursera.org/learn/real-time-cyber-threat-detection
Coursera Machine Learning by Andrew Ng also partly covers the topic:
Udemy Outlier Detection Algorithms in Data Mining and Data Science: https://www.udemy.com/outlier-detection-techniques/
Stanford Data Mining for Cyber Security also covers part of anomaly detection techniques. http://web.stanford.edu/class/cs259d/
Scikit-learn Novelty and Outlier Detection. It supports some popular algorithms like LOF, Isolation Forest and One-class SVM
Python Outlier Detection (PyOD): PyOD is a comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data. It includes more than 20 detection algorithms, including deep learning and outlier ensembles.
Anomaly Detection Toolbox - Beta: A collection of popular outlier detection algorithms in Matlab.
ELKI: Environment for Developing KDD-Applications Supported by Index-Structures: ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection.
RapidMiner Anomaly Detection Extension: The Anomaly Detection Extension for RapidMiner comprises the most well know unsupervised anomaly detection algorithms, assigning individual anomaly scores to data rows of example sets. It allows you to find data, which is significantly different from the normal, without the need for the data being labeled.
ELKI Outlier Datasets: https://elki-project.github.io/datasets/outlier
Outlier Detection DataSets (ODDS): http://odds.cs.stonybrook.edu/#table1
Unsupervised Anomaly Detection Dataverse: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/OPQMVF
Anomaly Detection Meta-Analysis Benchmarks: https://ir.library.oregonstate.edu/concern/datasets/47429f155
Paper Title | Year | Ref | Materials |
---|---|---|---|
Anomaly detection: A survey | 2009 | [7] | [PDF] |
A survey of outlier detection methodologies | 2004 | [10] | [PDF] |
On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study | 2016 | [5] | [HTML], [SLIDES] |
Outlier detection: applications and techniques | 2012 | [17] | [PDF] |
A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data | 2016 | [9] | [PDF] |
Abbreviation | Paper Title | Year | Ref | Materials |
---|---|---|---|---|
kNN | Efficient algorithms for mining outliers from large data sets | 2000 | [15] | [PDF] |
KNN | Fast outlier detection in high dimensional spaces | 2002 | [3] | [HTML] |
LOF | LOF: identifying density-based local outliers | 2000 | [4] | [PDF] |
IForest | Isolation forest | 2008 | [13] | [PDF] |
OCSVM | Time-series novelty detection using one-class support vector machines | 2003 | [14] | [PDF] |
Paper Title | Year | Ref | Materials |
---|---|---|---|
Graph based anomaly detection and description: a survey | 2015 | [2] | [PDF] |
Anomaly detection in dynamic networks: a survey | 2015 | [16] | [PDF] |
Gupta, M., Gao, J., Aggarwal, C.C. and Han, J., 2014. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9), pp.2250-2267. [PDF]
Pang, G., Cao, L., Chen, L. and Liu, H., 2016, December. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Data Mining (ICDM), 2016 IEEE 16th International Conference on (pp. 410-419). IEEE. [PDF]
Pang, G., Cao, L., Chen, L. and Liu, H., 2017, August. Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 2585-2591). AAAI Press. [PDF]
Zimek, A., Schubert, E. and Kriegel, H.P., 2012. A survey on unsupervised outlier detection in high‐dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), pp.363-387. [Downloadable Link]
Pang, G., Cao, L., Chen, L. and Liu, H., 2018. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [PDF]
Paper Title | Year | Ref | Materials |
---|---|---|---|
Outlier ensembles: position paper | 2013 | [1] | [PDF] |
Ensembles for unsupervised outlier detection: challenges and research questions a position paper | 2014 | [18] | [PDF] |
An Unsupervised Boosting Strategy for Outlier Detection Ensembles | 2018 | [6] | [HTML] |
Salehi, Mahsa & Rashidi, Lida. (2018). A Survey on Anomaly detection in Evolving Data: [with Application to Forest Fire Risk Prediction]. ACM SIGKDD Explorations Newsletter. 20. 13-23. [PDF]
Emaad Manzoor, Hemank Lamba, Leman Akoglu. Outlier Detection in Feature-Evolving Data Streams. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [PDF] [Github]
Pang, G., Cao, L., Chen, L. and Liu, H., 2018. Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection. In 24th ACM SIGKDD International Conference on Knowledge Discovery and Data mining (KDD). 2018. [PDF]
Micenková, B., McWilliams, B. and Assent, I., 2015. Learning representations for outlier detection on a budget. arXiv preprint arXiv:1507.08104. [PDF]
Zhao, Y., Hryniewicki, M.K. and PricewaterhouseCoopers, A., 2018. XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning. International Joint Conference on Neural Networks. [PDF]
Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, Christos Faloutsos. Beyond Outlier Detection: LookOut for Pictorial Explanation. ECML PKDD 2018. [PDF]
Liu, N., Shin, D. and Hu, X., 2017. Contextual outlier interpretation. arXiv preprint arXiv:1711.10589. [PDF]
Tang, G., Pei, J., Bailey, J. and Dong, G., 2015. Mining multidimensional contextual outliers from categorical relational data. Intelligent Data Analysis, 19(5), pp.1171-1192. [PDF]
Dang, X.H., Assent, I., Ng, R.T., Zimek, A. and Schubert, E., 2014, March. Discriminative features for identifying and interpreting outliers. In International Conference on Data Engineering (ICDE). IEEE. [PDF]
Yu, R., Qiu, H., Wen, Z., Lin, C. and Liu, Y., 2016. A survey on social media anomaly detection. ACM SIGKDD Explorations Newsletter, 18(1), pp.1-14. [PDF]
Yu, R., He, X. and Liu, Y., 2015. Glad: group anomaly detection in social media analysis. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(2), p.18. [PDF]
Kannan, R., Woo, H., Aggarwal, C.C. and Park, H., 2017, June. Outlier detection for text data. In Proceedings of the 2017 SIAM International Conference on Data Mining (pp. 489-497). Society for Industrial and Applied Mathematics. [PDF]
Security:
- Weller-Fahy, D.J., Borghetti, B.J. and Sodemann, A.A., 2015. A survey of distance and similarity measures used within network intrusion anomaly detection. IEEE Communications Surveys & Tutorials, 17(1), pp.70-91. [PDF]
- Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G. and Vázquez, E., 2009. Anomaly-based network intrusion detection: Techniques, systems and challenges. computers & security, 28(1-2), pp.18-28. [PDF]
Finance:
- Ahmed, M., Mahmood, A.N. and Islam, M.R., 2016. A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55, pp.278-288. [PDF]
ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD). Note: SIGKDD usually has an Outlier Detection Workshop (ODD), see ODD 2018.
ACM International Conference on Management of Data (SIGMOD)
IEEE International Conference on Data Mining (ICDM)
SIAM International Conference on Data Mining (SDM)
IEEE International Conference on Data Engineering (ICDE)
ACM InternationalConference on Information and Knowledge Management (CIKM)
ACM International Conference on Web Search and Data Mining (WSDM)
The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)
ACM Transactions on Knowledge Discovery from Data (TKDD)
IEEE Transactions on Knowledge and Data Engineering (TKDE)
ACM SIGKDD Explorations Newsletter
Data Mining and Knowledge Discovery
Knowledge and Information Systems (KAIS)
[1] | Aggarwal, C.C., 2013. Outlier ensembles: position paper. ACM SIGKDD Explorations Newsletter, 14(2), pp.49-58. |
[2] | Akoglu, L., Tong, H. and Koutra, D., 2015. Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery, 29(3), pp.626-688. |
[3] | Angiulli, F. and Pizzuti, C., 2002, August. Fast outlier detection in high dimensional spaces. In European Conference on Principles of Data Mining and Knowledge Discovery pp. 15-27. |
[4] | Breunig, M.M., Kriegel, H.P., Ng, R.T. and Sander, J., 2000, May. LOF: identifying density-based local outliers. ACM Sigmod Record, 29(2), pp. 93-104. |
[5] | Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I. and Houle, M.E., 2016. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), pp.891-927. |
[6] | Campos, G.O., Zimek, A. and Meira, W., 2018, June. An Unsupervised Boosting Strategy for Outlier Detection Ensembles. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 564-576). Springer, Cham. |
[7] | Chandola, V., Banerjee, A. and Kumar, V., 2009. Anomaly detection: A survey. ACM computing surveys , 41(3), p.15. |
[8] | Chawla, S. and Chandola, V., 2011, Anomaly Detection: A Tutorial. Tutorial at ICDM 2011. |
[9] | Goldstein, M. and Uchida, S., 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS one, 11(4), p.e0152173. |
[10] | Hodge, V. and Austin, J., 2004. A survey of outlier detection methodologies. Artificial intelligence review, 22(2), pp.85-126. |
[11] | Kriegel, H.P., Kröger, P. and Zimek, A., 2010. Outlier detection techniques. Tutorial at ACM SIGKDD 2010. |
[12] | Lazarevic, A., Banerjee, A., Chandola, V., Kumar, V. and Srivastava, J., 2008, September. Data mining for anomaly detection. Tutorial at ECML PKDD 2008. |
[13] | Liu, F.T., Ting, K.M. and Zhou, Z.H., 2008, December. Isolation forest. In International Conference on Data Mining, pp. 413-422. IEEE. |
[14] | Ma, J. and Perkins, S., 2003, July. Time-series novelty detection using one-class support vector machines. In IJCNN' 03, pp. 1741-1745. IEEE. |
[15] | Ramaswamy, S., Rastogi, R. and Shim, K., 2000, May. Efficient algorithms for mining outliers from large data sets. ACM Sigmod Record, 29(2), pp. 427-438). |
[16] | Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C. and Samatova, N.F., 2015. Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3), pp.223-247. |
[17] | Singh, K., & Upadhyaya, S. (2012). Outlier detection: applications and techniques. International Journal of Computer Science Issues (IJCSI), 9(1), 307. |
[18] | Zimek, A., Campello, R.J. and Sander, J., 2014. Ensembles for unsupervised outlier detection: challenges and research questions a position paper. ACM Sigkdd Explorations Newsletter, 15(1), pp.11-22. |
[19] |