ML-IDS is a machine learning-based intrusion detection system that utilizes the CICIDS2017 dataset for training and evaluation. This project implements three different machine learning models: Random Forest, Isolation Forest, and Neural Network, to detect various types of network intrusions and cyber attacks.
The above graph shows the performance comparison of the three implemented models across different metrics.
This project uses the CICIDS2017 dataset, which is a comprehensive intrusion detection dataset created by the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick (UNB). The dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs).
For more information about the dataset, visit: CICIDS2017 Dataset
- Data preprocessing and feature engineering
- Implementation of three machine learning models:
- Random Forest
- Isolation Forest
- Neural Network
- Model evaluation and performance comparison
- Visualization of results (confusion matrices, feature importance)
- Python 3.7+
- pandas
- numpy
- scikit-learn
- tensorflow
- matplotlib
- seaborn
- imbalanced-learn
-
Clone this repository:
git clone https://github.com/tks98/ml-ids.git cd ml-ids
-
Install the required packages:
pip install -r requirements.txt
-
Download the CICIDS2017 dataset and place the MachineLearningCVE CSV files in the
data/MachineLearningCVE
directory. -
Run the main script:
python main.py
-
The script will process the data, train the models, and generate results in the
results
directory.
main.py
: The main script that orchestrates the entire processdata/
: Directory to store the CICIDS2017 datasetmodel_cache/
: Directory to store cached models for faster subsequent runsresults/
: Directory to store output results and visualizations
The script generates several output files in the results
directory:
- Confusion matrices for each model
- Feature importance plot for the Random Forest model
- Overall performance comparison of all models
- CSV file containing anomalies detected by the Isolation Forest model
If you use this project or the CICIDS2017 dataset in your research, please cite the following paper:
@inproceedings{sharafaldin2018toward,
title={Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization},
author={Sharafaldin, Iman and Habibi Lashkari, Arash and Ghorbani, Ali A},
booktitle={4th International Conference on Information Systems Security and Privacy (ICISSP)},
pages={268--282},
year={2018},
organization={IEEE}
}
This project is licensed under the MIT License - see the LICENSE file for details.
- Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick (UNB) for providing the CICIDS2017 dataset.