RDS

Implementation of Reinforced Data Sampling for Model Diversification.

Requirements

numpy
torch
scikit-learn
pandas
tqdm

Machine Learning Tasks

This repository supports multiple machine learning tasks on multivariate, textual and visual data:

Binary Classification
Multi-Class Classification
Regression

Real-World Use Cases

Please contact us if you want to be listed here for real-world competitions or use cases.

Experiment Results

Experiments have been conducted on four datasets as the following.

Dataset	Task	Challenge	Size of Data	Evaluation	Year
MADELON	Binary Classification	NIPS 2013 Feature Selection Challenge	2,600 x 500 (multivariate)	AUC	2003
DR	Regression	Drug Reviews (Kaggle Hackathon)	215,063 x 6 (multivariate, text)	R^2	2018
MNIST	Multiclass Classification	Hand Written Digit Recognition	70,000 x 28 x 28 (image)	Micro-F1	1998
KLP	Binary Classification	Kalapa Credit Scoring Challenge	50,000 x 64 (multivariate, text)	AUC	2020

MADELON - Results

Sampling	#Sample		Class Ratio		LR	RF	MLP	Ensemble	Public
	Train	Test	Train	Test
Preset	2000	600	1.0000	1.0000	.6019	.8106	.5590	.6783	.9063
Random	2000	600	.9920	1.0270	.5742	.7729	.5774	.6453	.9002
Stratified	2000	600	1.0000	1.0000	.5673	.7470	.6153	.6360	.8828
RDS^{DET}	2001	599	1.0375	.9137	.6192	.8050	.6228	.6973	.8915
RDS^{STO}	2021	579	1.0010	.9966	.6192	.8050	.6050	.6947	.9106

DR - Results

Sampling	Train	Test	Ridge	MLP	CNN	Ensemble	Public
Preset	161,297	53,766	.4580	.5787	.7282	.6660	.7637
Random	161,297	53,766	.4597	.4179	.7353	.6485	.7503
RDS^{DET}	162,070	52,993	.4646	.5776	.7355	.6692	.7649
RDS^{STO}	161,944	53,119	.4647	.5370	.7509	.6562	.7600

MNIST - Results

Sampling	#Sample		Class Ratio		LR	RF	CNN	Ensemble	Public
	Train	Test	Train	Test
Preset	60000	10000	.8571	.1429	.9647	.9524	.9824	.9819	.9917
Random	59500	10500	.8500	.1500	.9603	.9465	.9779	.9768	.9914
Stratified	59500	10500	.8500	.1500	.9625	.9510	.9795	.9792	.9901
RDS^{DET}	59938	10062	.8562	.1438	.9495	.9382	.9757	.9769	.9927
RDS^{STO}	59496	10504	.8499	.1501	.9583	.9486	.9851	.9830	.9931

KLP - Results

Sampling	#Sample		Class Ratio		LR	RF	MLP	Ensemble	Public
	Train	Test	Train	Test
Preset	30000	20000	.0165	.0186	.5799	.5517	.5635	.5723	.5953
Simple	30000	20000	.0169	.0179	.5886	.5374	.5914	.5856	.6042
Stratified	30000	20000	.0173	.0173	.5952	.5608	.5780	.5983	.6014
RDS^{DET}	29999	20001	.0180	.0163	.6045	.5350	.5802	.6057	.5362
RDS^{STO}	30031	19969	.0172	.0174	.5997	.5491	.6354	.6072	.6096

Demos

Madelon - Binary Classification

Binary Classification with Deterministic Ensemble

python rds.py --data datasets/madelon.csv --target 0 -id MDL_DET --learning deterministic --sampling-ratio 0.7695 --envs models.MDL_RF models.MDL_MLP models.MDL_LR

Binary Classification with Stochastic Choice

python rds.py --data datasets/madelon.csv --target 0 -id MDL_STO --learning stochastic --sampling-ratio 0.7695 --envs models.MDL_RF models.MDL_MLP models.MDL_LR

Evaluating with Public Benchmarking

python evaluator.py --data datasets/madelon.csv --target 0 --sample outputs/MDL_DET.npy --task classification --measure auc --envs models.MDL_PS
python evaluator.py --data datasets/madelon.csv --target 0 --sample outputs/MDL_STO.npy --task classification --measure auc --envs models.MDL_PS

Boston Housing - Regression

Regression with Deterministic Ensemble

python rds.py --data datasets/boston.csv --target 0 -id BOS_DET --task regression --learning deterministic --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM

Regression with Stochastic Choice

python rds.py --data datasets/boston.csv --target 0 -id BOS_STO --task regression --learning stochastic --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM

Evaluating with Ensemble Benchmarking

python evaluator.py --data datasets/boston.csv --target 0 --sample outputs/BOS_DET.npy --task regression --measure auc --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM
python evaluator.py --data datasets/boston.csv --target 0 --sample outputs/BOS_STO.npy --task regression --measure auc --envs models.BOS_MLP models.BOS_Ridge models.BOS_SVM

MNIST - Multi-class Classification

Regression with Deterministic Ensemble

python rds.py --data-loader datasets.MNIST -id MNIST_DET --task classification --learning deterministic --sampling-ratio 0.8572 --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR

Regression with Stochastic Choice

python rds.py --data-loader datasets.MNIST -id MNIST_STO --task classification --learning stochastic --sampling-ratio 0.8572 --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR

Evaluating with Ensemble Benchmarking

python evaluator.py --data-loader datasets.MNIST --sample outputs/MNIST_DET.npy --task classification --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR
python evaluator.py --data-loader datasets.MNIST --sample outputs/MNIST_STO.npy --task classification --measure f1_micro --envs models.MNIST_CNN models.MNIST_RF models.MNIST_LR

Citing this work

Please consider citing us if this work is useful in your research:

@misc{nguyen2020reinforced,
    title={Reinforced Data Sampling for Model Diversification},
    author={Hoang D. Nguyen and Xuan-Son Vu and Quoc-Tuan Truong and Duc-Trong Le},
    year={2020},
    eprint={2006.07100},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

References

Lee, S., Prakash, S.P.S., Cogswell, M., Ranjan, V., Crandall, D. and Batra, D., 2016. Stochastic multiple choice learning for training diverse deep ensembles. In Advances in Neural Information Processing Systems (pp. 2119-2127).
Peng, M., Zhang, Q., Xing, X., Gui, T., Huang, X., Jiang, Y.G., Ding, K. and Chen, Z., 2019, July. Trainable undersampling for class-imbalance learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, pp. 4707-4714).
Gong, Z., Zhong, P. and Hu, W., 2019. Diversity in machine learning. IEEE Access, 7, pp.64323-64350.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
models		models
outputs		outputs
samples		samples
.gitignore		.gitignore
README.md		README.md
engine.py		engine.py
evaluator.py		evaluator.py
rds.py		rds.py
requirements.txt		requirements.txt
sampler.py		sampler.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDS

Requirements

Machine Learning Tasks

Real-World Use Cases

Experiment Results

MADELON - Results

DR - Results

MNIST - Results

KLP - Results

Demos

Madelon - Binary Classification

Boston Housing - Regression

MNIST - Multi-class Classification

Citing this work

References

About

Releases

Packages

Languages

kiminh/RDS

Folders and files

Latest commit

History

Repository files navigation

RDS

Requirements

Machine Learning Tasks

Real-World Use Cases

Experiment Results

MADELON - Results

DR - Results

MNIST - Results

KLP - Results

Demos

Madelon - Binary Classification

Boston Housing - Regression

MNIST - Multi-class Classification

Citing this work

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages