Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating the DINOv2 Experiment from Table 3 #99

Open
Saloni1Parekh609 opened this issue Nov 12, 2024 · 2 comments
Open

Replicating the DINOv2 Experiment from Table 3 #99

Saloni1Parekh609 opened this issue Nov 12, 2024 · 2 comments

Comments

@Saloni1Parekh609
Copy link

I'm trying to replicate the DINOv2 experiment from this table:
Testing DINOv2 Distillation

However, I had some questions here.

  1. Is this student model trained with a 224x224 resolution with CPE? Does DINOv2 sees a 224x224 resolution image as well? For evaluation on ADE20k, is the input data of 512x512 resolution? And is it 224x224 for the CLIP distilled model?
  2. What are the learning rate hyperparameters set for this experiment? Is it the same as the RADIO experiments? How many steps did you run this for?
  3. Could you provide a little more clarity behind the Learning Rate Scheduler for the RADIO experiment? For example, the cycling aspect of CosineAnnealing scheduler.
@mranzinger
Copy link
Collaborator

Hi.

Is your goal to replicate the feature distillation experiment (the table you're pointing to), or the overall RADIO experiment?

Is this student model trained with a 224x224 resolution with CPE? Does DINOv2 sees a 224x224 resolution image as well? For evaluation on ADE20k, is the input data of 512x512 resolution? And is it 224x224 for the CLIP distilled model?

Yes. Yes. Input data is 518px (owing to patch-14 student). Yes.

What are the learning rate hyperparameters set for this experiment? Is it the same as the RADIO experiments? How many steps did you run this for?

Same as RADIO hparams. The difference in this table is which, if any, teacher models we apply feature distillation against. 600k steps.

Could you provide a little more clarity behind the Learning Rate Scheduler for the RADIO experiment? For example, the cycling aspect of CosineAnnealing scheduler.

It's one cycle, with a warmup of 1,000 steps.

@Saloni1Parekh609
Copy link
Author

Hi, thank you for responding! Just that particular distillation experiment!

Thank you so much for this information!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants