Skip to content

Latest commit

 

History

History
 
 

malware

BODMAS: An Open Dataset for Learning based Temporal Analysis of PE Malware

Dataset info: https://github.com/whyisyoung/BODMAS

Publication: https://liminyang.web.illinois.edu/data/DLS21_BODMAS.pdf

Summary

BODMAS dataset includes 57,293 malware samples and 77,142 benign samples (134,435 in total). 
The malware samples are randomly sampled each month from a security company’s internal malware database. 
 - The data collection was performed from August 29, 2019, to September 30, 2020. 
 - The benign samples were collected from January 1, 2007, to September 30, 2020. 
 - The dataset covers 581 malware families. 
 - These malware samples are from a diverse set of malware categories (14 categories in total). 
 - The most prevalent categories are Trojan (29,972 samples), Worm (16,697 samples), Backdoor (7,331 samples), Downloader (1,031 samples), and Ransomware (821 samples).

Due to large data size, we only provide a subset of 500 samples for this tutorial.

Tutorial

Perform local robustness verification for a malware verifier trained on the BODMAS dataset.

Verification examples:

1. Adversarial perturbations of continuous variables.
2. Adversarial perturbations of discrete variables.
3. Adversarial perturbations of continuous & discrete variables.
4. Adversarial perturbations of all variables.

Verification examples from Formalise 2024 and AiSOLA 2023