Repository for Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants paper.
In praticular, you will find code to reproduce the paper experiments as well as an nice implementation of our new and efficient strategy for your projects.
If you want to reproduce our paper experiments:
- the notebooks here and here reproduce the experiments
- thise code contains implementation the protocols used for the numerical experiments of our article.
In order to use our MGS strategy:
The data sets of used for our article should be dowloaded inside the data/externals folder. The data sets are available at the followings adresses :
- Pima
- Phoneme : https://github.com/jbrownlee/Datasets/blob/master/phoneme.csv
- Abalone : https://archive.ics.uci.edu/dataset/1/abalone
- Wine : https://archive.ics.uci.edu/dataset/186/wine+quality
- Haberman : https://archive.ics.uci.edu/dataset/43/haberman+s+survival
- Yeast : https://archive.ics.uci.edu/dataset/110/yeast
- Vehicle : https://archive.ics.uci.edu/dataset/149/statlog+vehicle+silhouettes
- Ionosphere : https://archive.ics.uci.edu/dataset/52/ionosphere
- Breast cancer Wisconsin : https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
- CreditCard : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- MagicTel : https://www.openml.org/d/44125
- California : https://www.openml.org/d/44090
- House_16H : https://openml.org/d/821
Table 2 from the paper :
Strategy | None | CW | RUS | ROS | NM1 | BS1 | BS2 | SMOTE | CV SMOTE | MGS (d+1) |
---|---|---|---|---|---|---|---|---|---|---|
CreditCard (0.2%) | 0.970 | |||||||||
Abalone (1%) | 0.802 | |||||||||
Phoneme (1%) | 0.924 | |||||||||
Yeast (1%) | 0.955 | |||||||||
Wine (4%) | 0.941 | |||||||||
Pima (20%) | 0.808 | |||||||||
Haberman (10%) | 0.744 | |||||||||
MagicTel (20%) | 0.922 | |||||||||
California (1%) | 0.923 |
This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.
If you find the code usefull, please consider citing us :
@article{sakho2024theoretical,
title={Theoretical and experimental study of SMOTE: limitations and comparisons of rebalancing strategies},
author={Sakho, Abdoulaye and Scornet, Erwan and Malherbe, Emmanuel},
journal={arXiv preprint arXiv:2402.03819},
year={2024}
}