Malware Classification

Team - Gregory

Project Description: The ultimate objective of this project is the classification of documents among 9 different categories given the large uncompressed Microsoft Malware Classification Challenge dataset. The 9 Maleware categories are as follow:

Ramnit
Gatak
Lollipop
Kelihos_ver3
Simda
Tracur
Kelihos_verl
Vundo
Obfuscator.ACY

The files in the dataset contains only hexadecimal codes. The challenge is to design and develop a Classification model that can classify around 2721 test documents into the above mentioned 9 Malware categories.

Installation

Dask
Google Cloud Platform or alternatively you can use Coiled

Approach

Create a Dask cluster with the required configuration as per your dataset volume
Connect it through Web Interface / SSH and open Jupyter Notebook
Parse the file and extract all the words
Remove stopwords, punctuations
Calculate TF-IDF values and create a dataframe
Separate the dataset into training and testing datasets
Take the training dataset and separate it by the target values
Calculate statistical values such as mean, standard deviation for the dataset
Summarize the data by class
Calculate the Gaussian Probability Density Function
Estimate the class probabilities

Improvements

We estimated the probability of the documents by testing it against the trained Naive Bayes classifier and got the accuracy around 66%. By changing the classifier to Logistic Regression, we almost got 84% accuracy which is an improvement from our previous step.

Contributions

Please see CONTRIBUTORS file for more details.

Authors

License

This project is licensed under the MIT License - see the LICENSE file for the details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
dask-worker-space/dask-worker-space		dask-worker-space/dask-worker-space
data		data
other_models		other_models
.gitignore		.gitignore
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
data_files_downloader.ipynb		data_files_downloader.ipynb
naive_bayes.ipynb		naive_bayes.ipynb
project1-NaiveBayes(v1).py		project1-NaiveBayes(v1).py
team-gregory.ipynb		team-gregory.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Malware Classification

Team - Gregory

Installation

Approach

Improvements

Contributions

Authors

License

References

About

Releases

Packages

Contributors 3

Languages

License

dsp-uga/gregory-p1

Folders and files

Latest commit

History

Repository files navigation

Malware Classification

Team - Gregory

Installation

Approach

Improvements

Contributions

Authors

License

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages