Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot reproduce the performance of visual Entailment dataset. #3

Open
youngfly11 opened this issue Jun 28, 2021 · 4 comments
Open

Comments

@youngfly11
Copy link

Hi;
I conduct the pretraining with resent18+3 layer transformer by using indomain data. (without MVM loss)

I can get a similar result on VQA downstream tasks, around 66.5 accuracy.
But the performance on visual entailments is relatively lower than reported in the paper, I can just get 74 accuracy (~82% reported in paper)
I am wondering why the resnet18+3 layer outperforms the Uniter Base?
Are there any training strategies specialized for this downstream task?

Thanks

@youngfly11
Copy link
Author

hi, again
Is there any plan to release the Visual Entailment codes or process datasets? I still cannot reproduce the performance. Everything goes well except the VE tasks. It is very strange
Thanks

@alice-cool
Copy link

Hi; I conduct the pretraining with resent18+3 layer transformer by using indomain data. (without MVM loss)

I can get a similar result on VQA downstream tasks, around 66.5 accuracy. But the performance on visual entailments is relatively lower than reported in the paper, I can just get 74 accuracy (~82% reported in paper) I am wondering why the resnet18+3 layer outperforms the Uniter Base? Are there any training strategies specialized for this downstream task?

Thanks

Dear scholar, the picture warining is normal or I missed some files?
图片

@mactavish91
Copy link

mactavish91 commented Feb 17, 2023

hi, again Is there any plan to release the Visual Entailment codes or process datasets? I still cannot reproduce the performance. Everything goes well except the VE tasks. It is very strange Thanks

@youngfly11 I think they use both image premise and text premise, causing information leakage and resulting in performance far exceeding other models.

@Gavin001201
Copy link

hi, again Is there any plan to release the Visual Entailment codes or process datasets? I still cannot reproduce the performance. Everything goes well except the VE tasks. It is very strange Thanks

Hello, I want to know if you finally figured out the reason and whether there is a problem with the implementation of VD here.Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants