-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it necessary to train from 5-bit? #1
Comments
Hi, there is an ignorable accuracy drop between two kinds of initialization in our experiments, possibly due to sufficient training epochs. If training with fewer epochs, e.g. <40, initialized from 5-bit model should generate better results. |
Thanks for the rapid reply. I'm training 4bit mobilenet_v2 now. The top1 score on validate set is dramatically fluctuating (about 2% up or down) even the learning rate is 1e-4, is that normal? |
Does the fluctuation happen when the learning rate is delayed, e.g. 30 epochs, 60epochs in the default setting? I adopt SGD training with step-wise decay on the learning rate. A cosine scheduler can make the training process more smooth. |
Thanks for the reply~ Is it possible (how?) to use per-channel weight quantization in your method to boost the performance? |
Yes, you can.
|
Have you tried it? Did it improve the performance? |
It boosts performance, at expense of reducing the compression ratio. |
What's the accuracy drop if I'm training 4-bit mobilenet_v2 from full-precision when compared to initialized from 5bit model?
The text was updated successfully, but these errors were encountered: