-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor results on Squad 1.0 #4
Comments
Here are some suggestions that might help you:
|
Hi, 1) we trained the autoencoder on the 2M corpus that you provide replacing the "train_file" in this script https://github.com/dayihengliu/CRQDA/blob/master/crqda/run_train.sh. Would it be possible for you to release your trained autoencoder? Thanks! |
This work was done during my internship at Microsoft, but I have left Microsoft. So far, I can only find the augmented unanswered questions and the well-trained RoBERTa SQuAD 2.0 MRC model. Regarding the autoencoder, you can refer to https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT#quick-start-guide to download and preprocess the Wikipedia dataset. |
@rishabhjoshi Hi, Do you solve this problem? I want to improve it base on crqda, but Judging from your description, maybe the code is hard to run, if you solve it, can you share the augmented squad1.1 data (answerable) for me? thanks! |
@TingFree I was not able to reproduce the results for Squad 1.0 dataset. I was hoping to get the MRC model and the autoencoder (although, the MRC model and autoencoder I have trained are pretty good). I did try multiple hyperparameters but could never get as good results as the authors got for Squad 2. |
@rishabhjoshi Hi, Have you reproduce CRQDA on Squad2? I mean the same results as paper |
Hi, I wanted to augment the squad 1.0 dataset (not unanswerable questions). I trained a standard roberta MRC model using the transformers library which was giving 86.16 and 92.31 Exact match and F1 scores on the validation data.
I also trained an autoencoder for 100 epochs as described and the loss came down to about 0.04 with perfect regeneration.
I then tried running crqda to augment 30000 samples. I removed the "NEG" parameter, and added "SPAN = True" and "para". With the same hyperparameters (epsilon) that you used. Out of 30000 samples, only 1800 samples had questions generated which were selected (the jaccard was >= 0.3).
After manual inspection, I see that most of the generated questions are gibberish (especially if jaccard similarity is <= 0.8).
Can you share some insights on what might be going wrong and why the results are so poor given the MRC and the autoencoder models are training perfectly?
Any help would be greatly appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: