You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that in the paper, they said, 'We observe that this teacher has better performance than the student throughout the training, and hence, guides the training of the student by providing target features of higher quality'.
But I didn't find a theoretical explanation for why this happened ?
Also, how can we observe this during SSL training? And which metric is used to evaluate the performance?
The text was updated successfully, but these errors were encountered:
I think that's because the student model only has a 'short-term memory'. EMA encourages the teacher model to take a lot more images into account (long-term memory) so that it can map images to a more uniform distribution compared to the student.
Thank you for sharing the great project.
I noticed that in the paper, they said, 'We observe that this teacher has better performance than the student throughout the training, and hence, guides the training of the student by providing target features of higher quality'.
But I didn't find a theoretical explanation for why this happened ?
Also, how can we observe this during SSL training? And which metric is used to evaluate the performance?
The text was updated successfully, but these errors were encountered: