Skip to content

This repo contains the code for our paper "The Effect of Using Masked Language Models in Random Textual Data Augmentation".

Notifications You must be signed in to change notification settings

dml-qom/Masked-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Masked-EDA: The Effect of Using Masked Language Models in Random Textual Data Augmentation

Code for our paper accepted at CSICC 2021.

@inproceedings{rashid2021effect,
  title={The Effect of Using Masked Language Models in Random Textual Data Augmentation},
  author={Rashid, Mohammad Amin and Amirkhani, Hossein},
  booktitle={2021 26th International Computer Conference, Computer Society of Iran (CSICC)},
  pages={1--5},
  year={2021},
  organization={IEEE}
}

The original Easy Data Augmentation (EDA) code is shared here: https://github.com/jasonwei20/eda_nlp

Masked-EDA Usage

This code can be run on any text classification dataset (Check the example file 'data/tr_2000.tsv' for data structure).

First, install HuggingFace transformers: pip install transformer

Second, setup a machine learning library of your choice (TensorFlow or PyTorch) pip install pytorch or pip install tensorflow

Run

All configurations are similar to original EDA (https://github.com/jasonwei20/eda_nlp) except for the type of Masked Language Model you wish to use.

  • Note that the mask_model parameter must be one of the listed names. If no model name is provided, it would default to DistilBERT.

Available models: {'bert', 'roberta', 'distilbert'}

python ./augment.py --input='data/tr_2000.tsv' --output='data/tr2000_aug16.tsv' --mask_model='roberta' --num_aug=16 --alpha_sr=0.1 --alpha_rd=0.1 --alpha_ri=0.4 --alpha_rs=0.1

About

This repo contains the code for our paper "The Effect of Using Masked Language Models in Random Textual Data Augmentation".

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages