-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
word_level_augmentation #101
Comments
Hi Stella, Thank you! |
I was hoping that word_level_augmentation will generate minimally abrasive sentences, but when I used unif-0.9 and tf_idf-0.9 on IMDB dataset I get completely gibberish examples, tf_idf is much better but still no where close to being a good match to the original example. |
@spolavar I saved the augmented texts to check the results. If you are just trying to do augmentation maybe nlpaug will help? I reproduced the results eventually on IMDB and DBPedia with tf_idf and unif aug. In my understanding the value 0.7 is the probability of token being replaced, so 0.9 will lead to more token being replaced than 0.7. for tf-idf this value is a scaler so not exactly the probability but still the larger the more tokens will be replaced. For unif the best value was 0.3 according to the paper (it's indeed the case in my experiments). The reason may be that tf-idf is replacing unimportant words so it's more tolerant to the replacement. (Authors please correct me if I'm wrong ) |
@stellaywu thank you for the clarification, I agree with you when I lowered the probability the aug examples are comparable. I however still haven't been lucky enough to test the performance of uda against the inbuilt BERT model yet! My processed data is ready, but I run into tf compatibility issues. I am running the code on tf 2.3.1 and Python 3.6 and have decided to use the tf.compat.v1 to bridge. It allowed me to run the data processing part, but I am still having issues with the modeling part. Can you share a bit more about how you could tackle the code porting issues and could reproduce the results? Thank you again! |
to be specific when I run the command
This is triggered by the training input function trying to read the tensorflow data files from
Any ideas how to handle the |
@spolavar what is your tf version? the code only support tf 1 |
Hey, i'm running tf_idf augmentation for IMDB dataset and noticed that tf_idf-0.7 or tf_idf-0.9 will lead to more than half of the tokens in a sentence get replaced. Is that a desired outcome ?
Thanks!
The text was updated successfully, but these errors were encountered: