How to create FP16 quantization scales? #6

mgoin · 2024-05-01T21:51:36Z

All of the FP6 gemm functions take the FP6 weights and their FP16 scales for each output channel

 * [Input]
 *  fp6_tensor:  int  tensor of shape [OC, IC // 16 * 3];   // 3 INT32 words contains 16 FP6  weights.
 *  fp16_scale:  half tensor of shape [OC];                 // for row-wise quantization.

We have functions for converting FP16 weights to FP6 (weight_prepacking_fp16_to_fp6) and for packing the FP6 weights into the final inference format (weight_matrix_prepacking), but nothing to generate the scales to up-convert back to FP16.

In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?

The text was updated successfully, but these errors were encountered:

mgoin · 2024-05-03T21:51:38Z

@Summer-Summer any help here would be appreciated please

Summer-Summer · 2024-05-04T00:04:58Z

Sorry for the inconvenience. The generation of quantization scales is part of the model quantization process, and I believe that you can find the related code here.

I will add that API to this repo when I have more spare time.

gau-nernst mentioned this issue Jul 22, 2024

Does the repo provide a quantization kernel? #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to create FP16 quantization scales? #6

How to create FP16 quantization scales? #6

mgoin commented May 1, 2024

mgoin commented May 3, 2024

Summer-Summer commented May 4, 2024

How to create FP16 quantization scales? #6

How to create FP16 quantization scales? #6

Comments

mgoin commented May 1, 2024

mgoin commented May 3, 2024

Summer-Summer commented May 4, 2024