Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about CCR-CLIP experiment and code #92

Open
ZZXF11 opened this issue Mar 21, 2024 · 4 comments
Open

Question about CCR-CLIP experiment and code #92

ZZXF11 opened this issue Mar 21, 2024 · 4 comments

Comments

@ZZXF11
Copy link

ZZXF11 commented Mar 21, 2024

Looking forward to hearing from you.

@ZZXF11
Copy link
Author

ZZXF11 commented Mar 21, 2024

Thanks for your great job on Chinese Text Recognization in ICCV 2023. However, I have some questions about it.

  1. It seems that there is no image contrastive loss in the provided codes, is it possible to provide a reference code for it?
  2. In the zero-shot character experiment (the top row in 'Table 1"), it appears that a large number of printed artistic data were used to pre-train CLIP, e.g., the '2755-CZS' dataset, and then continued to train it with real data. This seems to be unfair to other methods, as it amounts to training with a lot of extra data. I'm not sure if I'm misinterpreting this, so please give me some suggestions.
    Thanks!

@ZZXF11 ZZXF11 changed the title Question about CCR-CLIP Question about CCR-CLIP experiment and code Mar 21, 2024
@Tom98714
Copy link

Tom98714 commented May 1, 2024

你好,请问关于图片对比损失部分的代码您复现出来了吗?
我按照自己的理解复现了一下,但是在第0个epoch,6000+步的时候,出现了loss:nan

loss_image = nn.BCEWithLogitsLoss()
logits_image = image_features @ image_features.t()
image_ground = torch.zeros(len(label), len(label), dtype=torch.float32).cuda()
for index, char in enumerate(label):
for i in range(len(label)):
if char == label[i]:
image_ground[index][i] = 1

img_img_loss = loss_image(logits_image, image_ground)
‘’‘……’‘’
total_loss = (loss_img(logits_per_image, ground_truth) + loss_txt(logits_per_text, ground_truth)) / 2 + img_img_loss

@ZZXF11
Copy link
Author

ZZXF11 commented May 2, 2024

你好,我没有复现过他的图片对比损失。
但是我用你提供的代码跑了一下,并没有出现nan,或许你可以加一下梯度裁剪试试?
另外,我直接按照源代码跑过,在我的任务上效果还不错,但是加上这个损失之后,效果反而变差了。

@gyao19
Copy link

gyao19 commented Jun 1, 2024

你好,请问关于图片对比损失部分的代码您复现出来了吗? 我按照自己的理解复现了一下,但是在第0个epoch,6000+步的时候,出现了loss:nan

loss_image = nn.BCEWithLogitsLoss() logits_image = image_features @ image_features.t() image_ground = torch.zeros(len(label), len(label), dtype=torch.float32).cuda() for index, char in enumerate(label): for i in range(len(label)): if char == label[i]: image_ground[index][i] = 1

img_img_loss = loss_image(logits_image, image_ground) ‘’‘……’‘’ total_loss = (loss_img(logits_per_image, ground_truth) + loss_txt(logits_per_text, ground_truth)) / 2 + img_img_loss

我试着修改过,ic13测试结果能到97.14%

equal_mask = torch.eq(ground_truth.unsqueeze(1), ground_truth.unsqueeze(0)).float()
targets = equal_mask / equal_mask.sum(1, keepdim=True)
loss_i = - torch.sum(nn.functional.log_softmax(logits_per_image, dim=1)*targets, dim=1).mean()
loss_t = - torch.sum(nn.functional.log_softmax(logits_per_text, dim=1)*targets, dim=1).mean()
total_loss = (loss_i + loss_t) / 2
logits_im2im = logit_scale * image_features @ image_features.t()
im2im_mask = targets
loss_im2im = - torch.sum(nn.functional.log_softmax(logits_im2im, dim=1)*im2im_mask, dim=1).mean()
total_loss = total_loss + loss_im2im

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants