You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All of the FP6 gemm functions take the FP6 weights and their FP16 scales for each output channel
* [Input]
* fp6_tensor: int tensor of shape [OC, IC // 16 * 3]; // 3 INT32 words contains 16 FP6 weights.
* fp16_scale: half tensor of shape [OC]; // for row-wise quantization.
We have functions for converting FP16 weights to FP6 (weight_prepacking_fp16_to_fp6) and for packing the FP6 weights into the final inference format (weight_matrix_prepacking), but nothing to generate the scales to up-convert back to FP16.
In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?
The text was updated successfully, but these errors were encountered:
Sorry for the inconvenience. The generation of quantization scales is part of the model quantization process, and I believe that you can find the related code here.
I will add that API to this repo when I have more spare time.
All of the FP6 gemm functions take the FP6 weights and their FP16 scales for each output channel
We have functions for converting FP16 weights to FP6 (
weight_prepacking_fp16_to_fp6
) and for packing the FP6 weights into the final inference format (weight_matrix_prepacking
), but nothing to generate the scales to up-convert back to FP16.In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?
The text was updated successfully, but these errors were encountered: