Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval #11

Open
hamigualisingl opened this issue Mar 1, 2024 · 9 comments

Comments

@hamigualisingl
Copy link

How does the performance of this fine-tuned model on Zero-shot Classification and Zero-shot Cross-Modal Retrieval?

@wusize
Copy link
Owner

wusize commented Mar 2, 2024

Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks.

@hamigualisingl
Copy link
Author

hamigualisingl commented Mar 2, 2024 via email

@hamigualisingl
Copy link
Author

hamigualisingl commented Mar 2, 2024 via email

@wusize
Copy link
Owner

wusize commented Mar 3, 2024

想咨询下您,微调后的模型,在我说的那俩个任务表现情况具体是多少呢

---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

I remembered there is a 3~4 decrease in the top1 classification accuracy on imagenet.

@hamigualisingl
Copy link
Author

hamigualisingl commented Mar 3, 2024 via email

@hamigualisingl
Copy link
Author

hamigualisingl commented Mar 3, 2024 via email

@wusize
Copy link
Owner

wusize commented Mar 3, 2024

感谢您的答复,我目前是做clip预训练的,也是希望能够让clip兼具俩种能力,全局和局部,我按照您论文里面的方案,在yffc15m训练一个vit-32的模型,发现在图文检索的表现不如baseline,所以就想前来请教一下。

---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Since you are doing pretraining, I guess you could take a glimpse of this paper. Like PACL, you could try replacing the cls token pooling with a global pooling at the last layer.

@wusize
Copy link
Owner

wusize commented Mar 3, 2024

感谢您的答复,我目前是做clip预训练的,也是希望能够让clip兼具俩种能力,全局和局部,我按照您论文里面的方案,在yffc15m训练一个vit-32的模型,发现在图文检索的表现不如baseline,所以就想前来请教一下。

---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: _@**._>; Cc: _@.@._>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: _@_.*>

Since you are doing pretraining, I guess you could take a glimpse of this paper. Like PACL, you could try replacing the cls token pooling with a global pooling at the last layer so that the feature extractions for regional and global representations are the same.

@hamigualisingl
Copy link
Author

hamigualisingl commented Mar 3, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants