Ransomware Detection Mechanism (RDM) is a tool for both combining machine learning to detect ransomware viruses within a network and to collect, visualize, and analyze IOCs for emotet. This is a 2020 University of Ottawa undergraduate honours project. For more a detailed summary of our work, checkout our report " \Report-Ransomware_Detection_using_Supervised Learning.pdf"
Professor Miguel A. Garzón
Faculty Member Ph.D., P.Eng.: School of Electrical Engineering and Computer Science
├── LICENSE <- Currently NO License
├── Makefile <- Makefile with commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party sources.
│ │ ├── binetflow <- Bidirectional netflow files for training set.
│ │ └── validation <- Bidirectional netflow files for validation.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── preprocessed <- Clean data set.
│ ├── processed <- The final, canonical data sets for modeling.
│ ├── raw <- The original, immutable data dump.
│ ├── trained <- The final data set after training and testing.
│ └── validation <- Results of validating model with validation data.
│
├── docs <- A default Sphinx project; see sphinx-doc.org for details
│
├── metric <- Generated files for training and validation metrics
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
│ the creator's initials, and a short `-` delimited description, e.g.
│ `1.0-jqp-initial-data-exploration`.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
└── src <- Source code for use in this project.
├── __init__.py <- Makes src a Python module
│
├── data <- Scripts to download or generate data
│ └── make_dataset.py
│
├── features <- Scripts to turn raw data into features for modeling
│ └── build_features.py
│
├── kibana <- Scripts to collect iocs for kibana.
│
├── models <- Scripts to train models and then use trained models to make
│ │ predictions
│ ├── predict_model.py
│ └── train_model.py
│
└── visualization <- Scripts to create exploratory and results oriented visualizations
└── visualize.py
Project based on the cookiecutter data science project template. #cookiecutterdatascience
Follow these instructions to setup a development environment on your local machine. This is for development or testing purposes.
You must install and setup the following to be able to run the project.
First we must install Python 3 and add Python to your PATH.
Check if Python is correctly installed.
> C:\Users\alan1>python -V
Python 3.8.1
Next is to create a python 3 virtual enviroment (venv) for the project. On CMD or Linux terminal, find a directory path for your venv. Once you chose a directory, create a venv.
> C:\>python3 -m venv RMD-env
This will create a venv and folder called RMD-env. To activate the venv you must run the activate script (must be done for every new terminal or CMD).
> C:\>RMD-env\Scripts\activate.bat (Windows)
> $ source RMD-env/bin/activate (MacOS)
Once the venv is activated, python will be run with the venv and all libraries can be installed specifically in the venv. Now we must install key python libraries.
> (RDM-env) C:\Ransomware-Detection-Mechanism\pip install -r requirements.txt
You are now ready to run scripts. Proceed to Kibana or Training for more information.
- Download and unzip from https://www.elastic.co/downloads/elasticsearch
- Run bin/elasticsearch (or bin\elasticsearch.bat on Windows)
- Run curl http://localhost:9200/ or Invoke-RestMethod http://localhost:9200 with PowerShell
- Download and unzip from https://www.elastic.co/downloads/kibana
- Open config/kibana.yml in an editor
- Set elasticsearch.hosts to point at your Elasticsearch instance
- Run bin/kibana (or bin\kibana.bat on Windows)
- Point your browser at http://localhost:5601
For information on how to set up the RDM Kibana Environment with the IOCS, see the Setting Up Kibana Environment page.
For information on how to use BulkAPI JSON scripts, see the How to Use Bulk JSON Scripts page.
├── 1. Preparing Data (1.3 mil rows) <- /src/data> python make_dataset.py
│ ├── 1. Download data sets (8.5 min)
│ ├── 2. Create raw data (8.5 sec)
│ ├── 3. Create interim data (8 sec)
│ └── 4. Create preprocessed data (30 sec)
├── 2. Build Features (1.3 mil rows) (2.28 hours) <- /src/features> python build_features.py
├── 3. Train Model (1.3 mil rows) (6.38 hours) <- /src/models> python train_model.py
│ ├── One Class SVM (5.03 hours)
│ ├── Confidence SCore (36.4 min)
│ ├── Save OC Features CSV (4.4 min)
│ ├── Linear Regression (22.2 min)
│ └── Save LR Features CSV (4.6 min)
└── 4. Predict Model (24.7 min) <- /src/models> python predict_model.py
Prerequisite: Download and extract csv files and place in proper path or run make_dataset and build_features. CSV Files This is done as Github only allows upload of no larger than 100 MB without Git LFS.
To extract files:
tar -xzvf processed.tar.gz
tar -xzvf val_processed.tar.gz
Place processed.csv in /data/processed
$ docker build -f DockerTrain/Dockerfile -t train_model:latest .
$ docker run -d train_model:latest
Place processed.csv in /data/processed/ Place val_processed.csv in /data/processed/
$ docker build -f DockerPredict/Dockerfile -t predict_model:latest .
$ docker run -d predict_model:latest
$ docker ps -a
$ docker exec -it (container id) bash
Follow these instructions to run train or predict off the bat.
$ cd project/src/models
$ python train_model.py
$ python predict_model.py
Follow these instructions to train or test models from scratch.
$ cd project/src/data
$ python make_dataset.py
$ cd ../features
$ python build_features.py
$ cd ../models
$ python train_model.py
$ python predict_model.py
Visit the Github Wiki for more documentation and research on the project.