Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Results for MobileVIT v1 Models #68

Open
hzphzp opened this issue Feb 3, 2023 · 5 comments
Open

Inconsistent Results for MobileVIT v1 Models #68

hzphzp opened this issue Feb 3, 2023 · 5 comments

Comments

@hzphzp
Copy link

hzphzp commented Feb 3, 2023

Dear Developers,

I have been running with the latest code for the MobileVIT v1 models, and I have noticed some inconsistencies in the results compared to what was reported in previous papers and what I expect based on my understanding of the models.

First, when I run the MobileVIT v1 x_small model with the latest code, I get a MACs value of 1028.243M, which is significantly different from the reported value of 0.7G in the original paper and the cited value of 0.9G in other papers. I have attached a screenshot of the result for your reference.
image

However, when I revert the code back to the cvnets-v0.1 commit, and run the MobileVIT v1 x_small model again, I get a MACs value of 986.269M, which is more consistent with some of the references in the literature. I have also attached a screenshot of this result for your reference.
image

Second, I also observed that when I run the MobileVIT v1 small model with the latest code, I get an accuracy of 77.47 on the ImageNet1K dataset, which is lower than the reported value of 78.4 in the paper. I have not modified the model or the configuration, so I would like to know if there have been any changes in the code for the MobileVIT v1 models.

I would greatly appreciate it if someone could provide me with an explanation for these inconsistencies and, if possible, inform me of any code changes that have been made to the MobileVIT v1 models.

Thank you for your time and assistance in this matter. I am looking forward to your response.

Best regards,
Zhipeng

@sacmehta
Copy link
Collaborator

sacmehta commented Feb 6, 2023

In the paper, FLOPs were reported at 224x224 input resolution. Are you using the same size?

Regarding accuracy: EMA is useful for MobileViT models (as noted in paper, all MobileViT models are with EMA). Are you evaluating models with EMA?

@hzphzp
Copy link
Author

hzphzp commented Feb 6, 2023

In the paper, FLOPs were reported at 224x224 input resolution. Are you using the same size?

Regarding accuracy: EMA is useful for MobileViT models (as noted in paper, all MobileViT models are with EMA). Are you evaluating models with EMA?

Hi, thanks for your reply.

Regarding the GFLOPS results, I understand that the discrepancy with the paper could be due to the different image size used for testing. However, I am still curious about the reason for the different results between the two commits with the tags cvnets-v0.1 and cvnets-v0.2 because the two results above both come with 256x256 test image size.

Regarding the accuracy issue, I can confirm that I did use EMA when running the mobilevit v1 model, and the reported accuracy was the best EMA accuracy. I did not modify the original model or configuration, and followed the default training command provided in the readme file.

@sacmehta
Copy link
Collaborator

sacmehta commented Feb 8, 2023

Thanks for clarification.

Regarding FLOPs: There was an error in computing FLOPs in v0.1 for MobileViT blocks, because of which FLOPs were over-estimated in v0.1. This was fixed in v0.2. That is why you observe different flops in v0.1 and v0.2. See here.

Regarding accuracy: Accuracy differences were noted when we migrated from OpenCV (v0.1) to PIL (v0.2). Similar to other works, longer warm-up helped here. For an example configuration, see this config.

Following schedule is recommended with variable batch sampler. Here, learning rate and total epochs are the same as the paper, but with a longer warm-up schedule. Hope this helps.

scheduler:
  name: "cosine"
  is_iteration_based: false
  max_epochs: 300
  warmup_iterations: 20000 # longer warm-up
  warmup_init_lr: 0.0002
  cosine:
    max_lr: 0.002
    min_lr: 0.0002

@hzphzp
Copy link
Author

hzphzp commented Feb 13, 2023

Thank you for your reply, I found the bug is that I still use the "imagenet_opencv" as dataIO function.

@CharlesPikachu
Copy link

I was also working on re-implementing mobilevit in SSSegmentation and found it is important to set EMA while warm up iters seems not very important? (only for segmentation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants