ML-IDS: Machine Learning-based Intrusion Detection System

Overview

ML-IDS is a machine learning-based intrusion detection system that utilizes the CICIDS2017 dataset for training and evaluation. This project implements three different machine learning models: Random Forest, Isolation Forest, and Neural Network, to detect various types of network intrusions and cyber attacks.

Performance Overview

The above graph shows the performance comparison of the three implemented models across different metrics.

Dataset

This project uses the CICIDS2017 dataset, which is a comprehensive intrusion detection dataset created by the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick (UNB). The dataset contains benign and the most up-to-date common attacks, which resembles the true real-world data (PCAPs).

For more information about the dataset, visit: CICIDS2017 Dataset

Features

Data preprocessing and feature engineering
Implementation of three machine learning models:
- Random Forest
- Isolation Forest
- Neural Network
Model evaluation and performance comparison
Visualization of results (confusion matrices, feature importance)

Requirements

Python 3.7+
pandas
numpy
scikit-learn
tensorflow
matplotlib
seaborn
imbalanced-learn

Installation

Clone this repository:

git clone https://github.com/tks98/ml-ids.git
cd ml-ids

Install the required packages:
```
pip install -r requirements.txt
```

Usage

Download the CICIDS2017 dataset and place the MachineLearningCVE CSV files in the data/MachineLearningCVE directory.
Run the main script:
```
python main.py
```
The script will process the data, train the models, and generate results in the results directory.

Project Structure

main.py: The main script that orchestrates the entire process
data/: Directory to store the CICIDS2017 dataset
model_cache/: Directory to store cached models for faster subsequent runs
results/: Directory to store output results and visualizations

Results

The script generates several output files in the results directory:

Confusion matrices for each model
Feature importance plot for the Random Forest model
Overall performance comparison of all models
CSV file containing anomalies detected by the Isolation Forest model

Citation

If you use this project or the CICIDS2017 dataset in your research, please cite the following paper:

@inproceedings{sharafaldin2018toward,
  title={Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization},
  author={Sharafaldin, Iman and Habibi Lashkari, Arash and Ghorbani, Ali A},
  booktitle={4th International Conference on Information Systems Security and Privacy (ICISSP)},
  pages={268--282},
  year={2018},
  organization={IEEE}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick (UNB) for providing the CICIDS2017 dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-IDS: Machine Learning-based Intrusion Detection System

Overview

Performance Overview

Dataset

Features

Requirements

Installation

Usage

Project Structure

Results

Citation

License

Acknowledgments

About

Releases

Packages

Languages

License

tks98/ml-ids

Folders and files

Latest commit

History

Repository files navigation

ML-IDS: Machine Learning-based Intrusion Detection System

Overview

Performance Overview

Dataset

Features

Requirements

Installation

Usage

Project Structure

Results

Citation

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages