-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cannot reproduce the performance of visual Entailment dataset. #3
Comments
hi, again |
@youngfly11 I think they use both image premise and text premise, causing information leakage and resulting in performance far exceeding other models. |
Hello, I want to know if you finally figured out the reason and whether there is a problem with the implementation of VD here.Thanks |
Hi;
I conduct the pretraining with resent18+3 layer transformer by using indomain data. (without MVM loss)
I can get a similar result on VQA downstream tasks, around 66.5 accuracy.
But the performance on visual entailments is relatively lower than reported in the paper, I can just get 74 accuracy (~82% reported in paper)
I am wondering why the resnet18+3 layer outperforms the Uniter Base?
Are there any training strategies specialized for this downstream task?
Thanks
The text was updated successfully, but these errors were encountered: