-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducibility on GLUE #9
Comments
No, I have not yet evaluated on downstream tasks, but it's definitely in the pipeline. Maybe I can get some runs going this weekend. But, I did some finetuning on some private tasks and it did pretty well, so I don't think there will be many problems. What implementation are you using for finetuning on GLUE? PS: As I see you are a fellow Berliner working on FNet, maybe we can connect outside of GitHub some time :) |
Also, are you planning to contribute FNet to HuggingFace? I think this would also be a cool thing. |
Yes sure :) My plan was to use the model for some downstream tasks with long documents. I just thought it would be easier to implement everything in the Huggingface ecosystem to leverage existing implementations of GLUE and such. But yes, if everything goes well, then of course it would be a great idea to contribute the model and source code to Huggingface. For evaluation, I used the run_glue.py script in the examples with the following parameters.
I tested SST2, CoLA and QNLI but the model did not improve on those tasks. Neither with my custom pre-training scripts nor with the one from Huggingface run_mlm.py. But of course I cannot exclude that it is due to my implementation... |
I guess you are not using the official checkpoint converted to fit your hugging face model? Because you also seem to use a different tokenizer. I conclude that you did run a pre-training from scratch. On what dataset? For how long? What was the MLM score? Maybe the model is just not trained up enough to handle fine-tuning. |
I just ran SST2 from the FNet base checkpoint converted to PyTorch and it learned pretty smoothly. Epochs: 3 |
Hi Erik, I just ran some internal benchmarks on a custom pre-trained FNet base (12 layers). In doing so, I realized that I forgot the attention mask in my implementation and that I had some padded inputs in both my training and my downstream tasks. So I adjusted my implementation to simply multiply the attention mask with the embeddings in the fourier layer. This seems to be working. GLUE is still significantly worse than with a normal BERT base, but the results are no longer purely random. (~84% accuracy on SST2, ~11% correlation on CoLA). |
Hi,
I am currently working on reproducing the results from the paper on the GLUE benchmark. However, my current results are very far from those in the paper. Have you already conducted experiments in this direction or could you reproduce the scores?
I have a running implementation compatible with Huggingface if you want to try it out:
https://github.com/paul-grundmann/transformers/blob/fnet/src/transformers/models/fnet/modeling_fnet.py
In my case, it seems that the model steadily learns on the masked language modeling task but does not improve on downstream tasks at all even after 200k pre-training steps.
The text was updated successfully, but these errors were encountered: