-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about the hyper-parameters used in other KD methods on different cases #34
Comments
The hyperparameters should be the same between different teacher-student pairs. We simply reported the results in CRD's original paper. |
@Zzzzz1 Thanks for the replay, this is very helpful. I have another question. Given that the OFD performance on ShuffleNet is reported in the paper, why the ShuffleNet models are not implemented (e.g.: |
Sorry for that. We didn't test the code with all pairs so the ShuffleNet get_bn_before_relu function for OFD is missed. We will fix that. |
Please let me know if you now have the values of these hyperparameters. |
First of all, thank you for the excellent work. We are currently attempting to reproduce the performance of various KD methods, including FitNet, RKD, CRD, ReviewKD, and others, as detailed in the DKD paper. We have a question regarding the hyperparameters used in CIFAR-100 for different KD methods. Specifically, we are curious the values used across different teachers and students for these KD methods (except DKD). Would you mind posting these hyperparameters🥰?
The text was updated successfully, but these errors were encountered: