Is it necessary to train from 5-bit? #1

talenz · 2021-03-18T11:37:15Z

What's the accuracy drop if I'm training 4-bit mobilenet_v2 from full-precision when compared to initialized from 5bit model?

ChaofanTao · 2021-03-18T12:15:40Z

Hi, there is an ignorable accuracy drop between two kinds of initialization in our experiments, possibly due to sufficient training epochs. If training with fewer epochs, e.g. <40, initialized from 5-bit model should generate better results.

talenz · 2021-03-19T06:50:47Z

Hi, there is an ignorable accuracy drop between two kinds of initialization in our experiments, possibly due to sufficient training epochs. If training with fewer epochs, e.g. <40, initialized from 5-bit model should generate better results.

Thanks for the rapid reply. I'm training 4bit mobilenet_v2 now. The top1 score on validate set is dramatically fluctuating (about 2% up or down) even the learning rate is 1e-4, is that normal?

ChaofanTao · 2021-03-19T09:43:59Z

Does the fluctuation happen when the learning rate is delayed, e.g. 30 epochs, 60epochs in the default setting? I adopt SGD training with step-wise decay on the learning rate. A cosine scheduler can make the training process more smooth.

talenz · 2021-04-20T06:27:37Z

Does the fluctuation happen when the learning rate is delayed, e.g. 30 epochs, 60epochs in the default setting? I adopt SGD training with step-wise decay on the learning rate. A cosine scheduler can make the training process more smooth.

Thanks for the reply~ Is it possible (how?) to use per-channel weight quantization in your method to boost the performance?

ChaofanTao · 2021-04-20T07:00:36Z

Yes, you can.

Write the channel-wise quantization strategy in def weight_quantization(b, grids, power=False): in file models/fat_quantization.py,
And then set channel-wise alpha in class weight_quantize_fn(nn.Module):, e.g. self.register_parameter('wgt_alpha', Parameter(torch.Tensor(num_of_channels)))

talenz · 2021-04-20T08:36:32Z

Yes, you can.

Write the channel-wise quantization strategy in def weight_quantization(b, grids, power=False): in file models/fat_quantization.py,

And then set channel-wise alpha in class weight_quantize_fn(nn.Module):, e.g. self.register_parameter('wgt_alpha', Parameter(torch.Tensor(num_of_channels)))

Have you tried it? Did it improve the performance?

ChaofanTao · 2021-04-21T09:52:59Z

It boosts performance, at expense of reducing the compression ratio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it necessary to train from 5-bit? #1

Is it necessary to train from 5-bit? #1

talenz commented Mar 18, 2021

ChaofanTao commented Mar 18, 2021

talenz commented Mar 19, 2021

ChaofanTao commented Mar 19, 2021

talenz commented Apr 20, 2021

ChaofanTao commented Apr 20, 2021

talenz commented Apr 20, 2021

ChaofanTao commented Apr 21, 2021

Is it necessary to train from 5-bit? #1

Is it necessary to train from 5-bit? #1

Comments

talenz commented Mar 18, 2021

ChaofanTao commented Mar 18, 2021

talenz commented Mar 19, 2021

ChaofanTao commented Mar 19, 2021

talenz commented Apr 20, 2021

ChaofanTao commented Apr 20, 2021

talenz commented Apr 20, 2021

ChaofanTao commented Apr 21, 2021