Dataset info: https://github.com/whyisyoung/BODMAS
Publication: https://liminyang.web.illinois.edu/data/DLS21_BODMAS.pdf
BODMAS dataset includes 57,293 malware samples and 77,142 benign samples (134,435 in total).
The malware samples are randomly sampled each month from a security company’s internal malware database.
- The data collection was performed from August 29, 2019, to September 30, 2020.
- The benign samples were collected from January 1, 2007, to September 30, 2020.
- The dataset covers 581 malware families.
- These malware samples are from a diverse set of malware categories (14 categories in total).
- The most prevalent categories are Trojan (29,972 samples), Worm (16,697 samples), Backdoor (7,331 samples), Downloader (1,031 samples), and Ransomware (821 samples).
Due to large data size, we only provide a subset of 500 samples for this tutorial.
Perform local robustness verification for a malware verifier trained on the BODMAS dataset.
Verification examples:
1. Adversarial perturbations of continuous variables.
2. Adversarial perturbations of discrete variables.
3. Adversarial perturbations of continuous & discrete variables.
4. Adversarial perturbations of all variables.
Verification examples from Formalise 2024 and AiSOLA 2023