-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad performance on ImageNet variants #15
Comments
Hi, But thanks for sharing these numbers. Could you also compare them with standard finetuning if you have those numbers as well. |
It appears that weight ensemble is not applied in the current code. I cannot see the line where args.alpha is used. As mentioned in the comment, I will also experiment with ViT-B/16. |
The results should be much better even without weight ensembling. I am not really sure about what the baseline for say standard cross entropy finetuning would be, but still FLYP should give better OOD accuracies than zeroshot (even without ensembling). I used the ensembling code from https://github.com/mlfoundations/wise-ft. Let me know the ViTB/16 numbers once you get them and we can debug then |
Hi, |
Sorry for the late reply, I ran the FLYP with CLIP ViT/16 without ensembling (i.e., WiSE-FT) and ImageNet Top-1 accuracy: 82.4 Relatively robustness is maintained compared to the experiment using ViT-B/32, but it seems to be lower overall than the performance reported in the paper (Avg OOD ours: 58.9, Reported: 60.2), especially ImageNet-R, A, and ObjectNet. In particular, the score I obtained is not much different from the zero-shot OOD performance, and robustness is maintained. What should I modify in my experiment to get the reported score? |
Did you use the CLI arguments in the readme? Can you please send me your logs. |
Sorry for the late reply. Here are the arguments. Below are the logs. ObjectNet dataset is bigger than other datasets, so I only do evaluate on ObjectNet after 8 epochs. 2023-10-23,14:31:17 | INFO | Train Epoch: 0 [ 512/1281167 (0%)] Data (t): 0.000 Batch (t): 5.934, 29.8708/s LR: 0.000000 Loss: 1.7685 (1.7685) ... 2023-10-23,19:08:39 | INFO | Train Epoch: 9 [ 512/1281167 (0%)] Data (t): 0.000 Batch (t): 1.621, 121.220/s LR: 0.000001 Loss: 0.38631 (0.38631) |
I ran the FLYP code to compare with "Masked Images Are Counterfactual Samples for Robust Fine-tuning, CVPR 2023", using ViT-B/32 model.
I expect that FLYP can be competitive with other methods, but the performance on OOD datasets of model trained with FLYP is significantly degraded.
Zero-shot CLIP performance using ViT-B/32 is the following:
ImageNet Top-1 accuracy: 63.4
ImageNetV2 Top-1 accuracy: 55.9
ImageNetR Top-1 accuracy: 69.3
ImageNetSketch Top-1 accuracy: 42.3
ImageNetA Top-1 accuracy: 31.4
I ran just one epoch training with FLYP, but its performance is:
ImageNet Top-1 accuracy: 73.3
ImageNetV2 Top-1 accuracy: 62.6
ImageNetR Top-1 accuracy: 63.1
ImageNetSketch Top-1 accuracy: 40.9
ImageNetA Top-1 accuracy: 25.9
FLYP cannot preserve the robustness, and the performances on ImageNet-R, ImageNet Sketch, and ImageNet-A are dropped compared to Zero-shot CLIP, even just trained for an epoch. I use the same parameters that are used in training for ViT-B/16 experiments.
Can you clarify this phenomenon? Are there any wrong things in this experiment?
The text was updated successfully, but these errors were encountered: