This repository contains the final submission for VMWare Campus Ambassador Hackathon. We have used unsupervised Machine Learning algorithms to detect anomalies in web server access logs.
A publicly available access log from a web server was used. After parsing the log file, features such as mean number of requests per day, mean time between successive requests, etc were extract for each IP address. The final dataset is present in finaldata.csv
. The code used for data processing can be found in data_preparation.ipynb
More information and proper discussion on our approach as well as the proposed improvements can be found in Prototype_Submission.pdf
The following unsupervised learning algorithms were used to detect anomalous requests present in the dataset:
- K Means Clustering
- Isolation Forest
- One Class SVM
Code, data visualisation and anomaly detection results can be found in Unsupervised Learning.ipynb
Click Here or the link can be found in video.md