Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Teacher network perform bettert than Student one during training? #274

Open
Hongbo-Z opened this issue Apr 9, 2024 · 1 comment
Open

Comments

@Hongbo-Z
Copy link

Hongbo-Z commented Apr 9, 2024

Thank you for sharing the great project.

I noticed that in the paper, they said, 'We observe that this teacher has better performance than the student throughout the training, and hence, guides the training of the student by providing target features of higher quality'.

But I didn't find a theoretical explanation for why this happened ?

Also, how can we observe this during SSL training? And which metric is used to evaluate the performance?

@wangh09
Copy link

wangh09 commented Jul 5, 2024

I think that's because the student model only has a 'short-term memory'. EMA encourages the teacher model to take a lot more images into account (long-term memory) so that it can map images to a more uniform distribution compared to the student.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants