performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval #11

hamigualisingl · 2024-03-01T08:19:13Z

How does the performance of this fine-tuned model on Zero-shot Classification and Zero-shot Cross-Modal Retrieval?

wusize · 2024-03-02T07:19:30Z

Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks.

hamigualisingl · 2024-03-02T07:31:47Z

感谢您的答复，我目前是做clip预训练的，也是希望能够让clip兼具俩种能力，全局和局部，我按照您论文里面的方案，在yffc15m训练一个vit-32的模型，发现在图文检索的表现不如baseline，所以就想前来请教一下。

…

---Original--- From: "Size Wu ***@***.***> Date: Sat, Mar 2, 2024 15:19 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

hamigualisingl · 2024-03-02T08:59:25Z

想咨询下您，微调后的模型，在我说的那俩个任务表现情况具体是多少呢

…

---Original--- From: "Size Wu ***@***.***> Date: Sat, Mar 2, 2024 15:19 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

wusize · 2024-03-03T10:40:08Z

想咨询下您，微调后的模型，在我说的那俩个任务表现情况具体是多少呢
…
---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

I remembered there is a 3~4 decrease in the top1 classification accuracy on imagenet.

hamigualisingl · 2024-03-03T10:42:09Z

我是在yffcm跑的clip，这个数据集跑出的结果都比较差，vit-b-32，微调后.掉了一半

…

---Original--- From: "Size Wu ***@***.***> Date: Sun, Mar 3, 2024 18:40 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) 想咨询下您，微调后的模型，在我说的那俩个任务表现情况具体是多少呢 … ---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***> I remembered there is a 3~4 decrease in the top1 classification accuracy on imagenet. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

hamigualisingl · 2024-03-03T10:45:06Z

多谢，可能是我参数设置不对

…

---Original--- From: "Size Wu ***@***.***> Date: Sun, Mar 3, 2024 18:40 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) 想咨询下您，微调后的模型，在我说的那俩个任务表现情况具体是多少呢 … ---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***> I remembered there is a 3~4 decrease in the top1 classification accuracy on imagenet. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

wusize · 2024-03-03T10:57:04Z

感谢您的答复，我目前是做clip预训练的，也是希望能够让clip兼具俩种能力，全局和局部，我按照您论文里面的方案，在yffc15m训练一个vit-32的模型，发现在图文检索的表现不如baseline，所以就想前来请教一下。
…
---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Since you are doing pretraining, I guess you could take a glimpse of this paper. Like PACL, you could try replacing the cls token pooling with a global pooling at the last layer.

wusize · 2024-03-03T10:58:30Z

感谢您的答复，我目前是做clip预训练的，也是希望能够让clip兼具俩种能力，全局和局部，我按照您论文里面的方案，在yffc15m训练一个vit-32的模型，发现在图文检索的表现不如baseline，所以就想前来请教一下。
…
---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: _@**._>; Cc: _@.@._>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: _@_.*>

Since you are doing pretraining, I guess you could take a glimpse of this paper. Like PACL, you could try replacing the cls token pooling with a global pooling at the last layer so that the feature extractions for regional and global representations are the same.

hamigualisingl · 2024-03-03T10:58:56Z

感谢

…

---Original--- From: "Size Wu ***@***.***> Date: Sun, Mar 3, 2024 18:57 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) 感谢您的答复，我目前是做clip预训练的，也是希望能够让clip兼具俩种能力，全局和局部，我按照您论文里面的方案，在yffc15m训练一个vit-32的模型，发现在图文检索的表现不如baseline，所以就想前来请教一下。 … ---Original--- From: "Size Wu @.> Date: Sat, Mar 2, 2024 15:19 PM To: @.>; Cc: @.@.>; Subject: Re: [wusize/CLIPSelf] performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval (Issue #11) Hi! Thanks for your question! The self distillation hurts performance on image recognition tasks indeed. Add a loss to align the image representation of the student and the teacher models can alleviate this degradation. But we did not add this to our paper as we focused on dense prediction tasks. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***> Since you are doing pretraining, I guess you could take a glimpse of this paper. Like PACL, you could try replacing the cls token pooling with a global pooling at the last layer. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval #11

performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval #11

hamigualisingl commented Mar 1, 2024

wusize commented Mar 2, 2024

hamigualisingl commented Mar 2, 2024 via email

hamigualisingl commented Mar 2, 2024 via email

wusize commented Mar 3, 2024

hamigualisingl commented Mar 3, 2024 via email

hamigualisingl commented Mar 3, 2024 via email

wusize commented Mar 3, 2024

wusize commented Mar 3, 2024

hamigualisingl commented Mar 3, 2024 via email

performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval #11

performance of Zero-shot Classification and Zero-shot Cross-Modal Retrieval #11

Comments

hamigualisingl commented Mar 1, 2024

wusize commented Mar 2, 2024

hamigualisingl commented Mar 2, 2024 via email

hamigualisingl commented Mar 2, 2024 via email

wusize commented Mar 3, 2024

hamigualisingl commented Mar 3, 2024 via email

hamigualisingl commented Mar 3, 2024 via email

wusize commented Mar 3, 2024

wusize commented Mar 3, 2024

hamigualisingl commented Mar 3, 2024 via email