This repo is a collection of awesome things about mixed sample data augmentation, including papers, code, etc.
We introduce a basic usage of mixed sample data augmentation, which was first proposed in mixup: Beyond Empirical Risk Minimization [ICLR2018] [code].
In mixup, the virtual training feature-target samples are produced as,
x˜ = λxi + (1 − λ)xj
y˜ = λyi + (1 − λ)yj
where (xi, yi) and (xj, yj) are two feature-target samples drawn at random from the training data, λ∈[0, 1]. The mixup hyper-parameter α controls the strength of interpolation between feature-target pairs and λ∼Beta(α, α).
The simple and basic training pipeline is shown as the following Figure,
The few lines of code necessary to implement mixup training in PyTorch
for (x1, y1), (x2, y2) in zip(loader1, loader2):
lam = numpy.random.beta(alpha, alpha)
x = Variable(lam * x1 + (1. - lam) * x2)
y = Variable(lam * y1 + (1. - lam) * y2)
optimizer.zero_grad()
loss(net(x), y).backward()
optimizer.step()
- AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty [ICLR2020] [code]
- SuperMix: Supervising the Mixing Data Augmentation [Arxiv2020] [code]
- Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification [AAAI2020]
- Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization [Arxiv2020]
- Attribute Mix: Semantic Data Augmentation for Fine-grained Recognition [Arxiv2020]
- Understanding and Enhancing Mixed Sample Data Augmentation [Arxiv2020] [code]
- Attentive CutMix: An Enhanced Data Augmentation Approach for Deep Learning Based Image Classification [ICASSP2020]
- Mixup-breakdown: a consistency training method for improving generalization of speech separation models [ICASSP2020]
- Cutmix: Regularization strategy to train strong classifiers with localizable features [ICCV2019] [code]
- Improved Mixed-Example Data Augmentation [WACV2019] [code]
- Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy [Arxiv2019]
- Target-Directed MixUp for Labeling Tangut Characters [ICDAR2019]
- Manifold Mixup improves text recognition with CTC loss [Arixv2019]
- Manifold Mixup: Better Representations by Interpolating Hidden States [ICML2019] [code]
- Data augmentation using random image cropping and patching for deep CNNs [TCSVT2019] [code]
- MixUp as Locally Linear Out-Of-Manifold Regularization [AAAI2019]
- On Adversarial Mixup Resynthesis [NeurIPS2019] [code]
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks [NeurIPS2019]
- mixup: Beyond Empirical Risk Minimization [ICLR2018] [code]
- Learning from between-class examples for deep sound recognition [ICLR2018] [code]
- Between-class Learning for Image Classification [CVPR2018] [code]
- Data Augmentation by Pairing Samples for Images Classification [Arxiv2018]
- Rare Sound Event Detection Using Deep Learning and Data Augmentation [Interspeech2019]
- Mixup Learning Strategies for Text-independent Speaker Verification [Interspeech2019]
- Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation [Interspeech2019]
- Deep Convolutional Neural Network with Mixup for Environmental Sound Classification [PRCV2018]
- Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition [Interspeech2018]
- An investigation of mixup training strategies for acoustic models in ASR [Interspeech2018] [code]
- Understanding Mixup Training Methods [IEEE ACCESS 2018]
- Rethinking Image Mixture for Unsupervised Visual Representation Learning [Arxiv2020]
- FocalMix: Semi-Supervised Learning for 3D Medical Image Detection [Arxiv2020]
- ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring [ICLR2020] [code]
- DivideMix: Learning with Noisy Labels as Semi-supervised Learning [ICLR2020] [code]
- OpenMix: Reviving Known Knowledge for Discovering Novel Visual Categories in An Open World [Arxiv2020]
- MixPUL: Consistency-based Augmentation for Positive and Unlabeled Learning [Arxiv2020]
- ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging [Arxiv2020]
- Interpolation Consistency Training for Semi-Supervised Learning [IJCAI2019] [code]
- RealMix: Towards Realistic Semi-Supervised Deep Learning Algorithms [Arxiv2019] [code]
- Unifying semi-supervised and robust learning by mixup [ICLR Workshop 2019]
- On Adversarial Mixup Resynthesis [NeurIPS2019] [code]
- Unifying semi-supervised and robust learning by mixup [ICLR2019 Workshop]
- Mixmatch: A holistic approach to semi-supervised learning [NeurIPS2019] [code]
- Semi-Supervised and Task-Driven Data Augmentation [IPMI2019]
- Mixup Regularization for Region Proposal based Object Detectors [Arxiv2020]
- FocalMix: Semi-Supervised Learning for 3D Medical Image Detection [Arxiv2020]
- Cutmix: Regularization strategy to train strong classifiers with localizable features [ICCV2019] [code]
- On mixup training: Improved calibration and predictive uncertainty for deep neural networks [NeurIPS2019]
- ROAM: Random Layer Mixup for Semi-Supervised Learning in Medical Imaging [Arxiv2020]
- Improving Robustness of Deep Learning Based Knee MRI Segmentation: Mixup and Adversarial Domain Adaptation [Arxiv2019]
- Improving Data Augmentation for Medical Image Segmentation [MIDL2018]
- Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy [Arxiv2020]
- Multi-class Novelty Detection Using Mix-up Technique [WACV2020]
- A U-Net Based Discriminator for Generative Adversarial Networks [CVPR2020]
- Mixed batches and symmetric discriminators for GAN training [ICML2018]
- mixup: Beyond Empirical Risk Minimization [ICLR2018] [code]
- Improve Unsupervised Domain Adaptation with Mixup Training [Arxiv2020]
- Adversarial Domain Adaptation with Domain Mixup [AAAI2020]
- Charting the Right Manifold: Manifold Mixup for Few-shot Learning [WACV2020] [code]
- An Experimental Evaluation of Mixup Regression Forests [EXPERT SYST APPL 2020]
- Data Augmentation Revisited: Rethinking the Distribution Gap between Clean and Augmented Data [Arxiv2019]