-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The reason why the classification accuracy is different from the result of the paper #1
Comments
The reason might be the distillation loss that I did not implement. |
I was first introduced to incremental learning, and when I read your code, I found that you adjusted the parameters a little bit to make your code more accurate to the paper. As for the distillation loss, I think what you wrote is consistent with the paper. I don't know right now. |
Excuse me, do you have any way to achieve the same level of results as the paper? I hope you can help me, [email protected], thank you |
I think the best way is to contact the author of that paper. |
A major issue with your implementation is that the layers of the main model are trainable even while adjusting the bias correction parameters when it should ideally be frozen. Also, the bias layer's parameters should be frozen during training the FC and convolutional layers. |
Hello, I seem to have found a problem with this code. If the sample set is removed and the bias is removed, the remaining part should be LwF, but when I run the LwF algorithm, I still find the result is wrong. I'm thinking about two things, one is that the FC layer is directly output to 10 in this code, and the other is the part of network parameters. I feel that as long as the accuracy of that part of LwF algorithm is improved, the accuracy of this code will be improved, but My ability is limited, I hope you can help me ~ |
If you want to send to improve accuracy of incremental, can try to modify the size of the train_x, from 9000 to 10000, in cifar100. Add in py, as well as in the BIC algorithm, the paper just changed the new category of bias, the old class bias did not change, this you can see, the last point is that you need in front of the previous_model plus with the torch. No_grad or self. Previous_model. Eval () |
I hope your algorithm can reach the result of the paper as soon as possible |
I duplicated it successfully, and the result was 0.817 0.7265 0.6555 0.5971 0.5561. I checked the experimental part of BIC paper again, and found that the author might deliberately choose the best data to write in the paper. The reason is that in Figure 8 of the paper, the first 20 categories do not increase. If the same model is selected, such as ResNet, for training, then the purple circle is unlikely to be 2% higher than other coils, such as ICARL |
It's possible. Thank you for your help! |
I have incorporated the same with a dynamic model and a couple of other details, e.g., the authors say that the bias correction should be done only after the second incremental batch has arrived. You can find the implementation at https://github.com/srvCodes/continual-learning-benchmark. @sairin1202 - thanks for having made your code public, would not be possible without that. 👍 |
Hello, I would say that the original formula of distillation is:
instead of
|
@EdenBelouadah I think they scale the distillation loss by T² because that's they say to do in the original knowledge distillation paper when using both soft and hard targets in the loss:
I don't think they do this scaling though in the original Large Scale Incremental Learning paper though. (See calculation of loss here). It looks like in the original implementation, they use:
as described in the paper. |
Thank you for the answer. I understand the use of T². However, the distillation used here is:
I still don't understand why "loss_hard_target" is multiplied by (1-alpha)? |
Thank you for your contribution. I would like to ask why there is a big error between the experimental results and the paper. I don't know right now.
The text was updated successfully, but these errors were encountered: