Question about CCR-CLIP experiment and code #92

ZZXF11 · 2024-03-21T08:16:50Z

Looking forward to hearing from you.

ZZXF11 · 2024-03-21T08:59:02Z

Thanks for your great job on Chinese Text Recognization in ICCV 2023. However, I have some questions about it.

It seems that there is no image contrastive loss in the provided codes, is it possible to provide a reference code for it?
In the zero-shot character experiment (the top row in 'Table 1"), it appears that a large number of printed artistic data were used to pre-train CLIP, e.g., the '2755-CZS' dataset, and then continued to train it with real data. This seems to be unfair to other methods, as it amounts to training with a lot of extra data. I'm not sure if I'm misinterpreting this, so please give me some suggestions.
Thanks!

Tom98714 · 2024-05-01T06:15:36Z

你好，请问关于图片对比损失部分的代码您复现出来了吗？
我按照自己的理解复现了一下，但是在第0个epoch，6000+步的时候，出现了loss:nan

loss_image = nn.BCEWithLogitsLoss()
logits_image = image_features @ image_features.t()
image_ground = torch.zeros(len(label), len(label), dtype=torch.float32).cuda()
for index, char in enumerate(label):
for i in range(len(label)):
if char == label[i]:
image_ground[index][i] = 1

img_img_loss = loss_image(logits_image, image_ground)
‘’‘……’‘’
total_loss = (loss_img(logits_per_image, ground_truth) + loss_txt(logits_per_text, ground_truth)) / 2 + img_img_loss

ZZXF11 · 2024-05-02T07:30:48Z

你好，我没有复现过他的图片对比损失。
但是我用你提供的代码跑了一下，并没有出现nan，或许你可以加一下梯度裁剪试试？
另外，我直接按照源代码跑过，在我的任务上效果还不错，但是加上这个损失之后，效果反而变差了。

gyao19 · 2024-06-01T09:12:37Z

你好，请问关于图片对比损失部分的代码您复现出来了吗？我按照自己的理解复现了一下，但是在第0个epoch，6000+步的时候，出现了loss:nan

loss_image = nn.BCEWithLogitsLoss() logits_image = image_features @ image_features.t() image_ground = torch.zeros(len(label), len(label), dtype=torch.float32).cuda() for index, char in enumerate(label): for i in range(len(label)): if char == label[i]: image_ground[index][i] = 1

img_img_loss = loss_image(logits_image, image_ground) ‘’‘……’‘’ total_loss = (loss_img(logits_per_image, ground_truth) + loss_txt(logits_per_text, ground_truth)) / 2 + img_img_loss

我试着修改过，ic13测试结果能到97.14%

equal_mask = torch.eq(ground_truth.unsqueeze(1), ground_truth.unsqueeze(0)).float()
targets = equal_mask / equal_mask.sum(1, keepdim=True)
loss_i = - torch.sum(nn.functional.log_softmax(logits_per_image, dim=1)*targets, dim=1).mean()
loss_t = - torch.sum(nn.functional.log_softmax(logits_per_text, dim=1)*targets, dim=1).mean()
total_loss = (loss_i + loss_t) / 2
logits_im2im = logit_scale * image_features @ image_features.t()
im2im_mask = targets
loss_im2im = - torch.sum(nn.functional.log_softmax(logits_im2im, dim=1)*im2im_mask, dim=1).mean()
total_loss = total_loss + loss_im2im

ZZXF11 changed the title ~~Question about CCR-CLIP~~ Question about CCR-CLIP experiment and code Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about CCR-CLIP experiment and code #92

Question about CCR-CLIP experiment and code #92

ZZXF11 commented Mar 21, 2024 •

edited

Loading

ZZXF11 commented Mar 21, 2024 •

edited

Loading

Tom98714 commented May 1, 2024

ZZXF11 commented May 2, 2024

gyao19 commented Jun 1, 2024 •

edited

Loading

Question about CCR-CLIP experiment and code #92

Question about CCR-CLIP experiment and code #92

Comments

ZZXF11 commented Mar 21, 2024 • edited Loading

ZZXF11 commented Mar 21, 2024 • edited Loading

Tom98714 commented May 1, 2024

ZZXF11 commented May 2, 2024

gyao19 commented Jun 1, 2024 • edited Loading

ZZXF11 commented Mar 21, 2024 •

edited

Loading

ZZXF11 commented Mar 21, 2024 •

edited

Loading

gyao19 commented Jun 1, 2024 •

edited

Loading