Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create FP16 quantization scales? #6

Open
mgoin opened this issue May 1, 2024 · 2 comments
Open

How to create FP16 quantization scales? #6

mgoin opened this issue May 1, 2024 · 2 comments

Comments

@mgoin
Copy link

mgoin commented May 1, 2024

All of the FP6 gemm functions take the FP6 weights and their FP16 scales for each output channel

 * [Input]
 *  fp6_tensor:  int  tensor of shape [OC, IC // 16 * 3];   // 3 INT32 words contains 16 FP6  weights.
 *  fp16_scale:  half tensor of shape [OC];                 // for row-wise quantization.

We have functions for converting FP16 weights to FP6 (weight_prepacking_fp16_to_fp6) and for packing the FP6 weights into the final inference format (weight_matrix_prepacking), but nothing to generate the scales to up-convert back to FP16.

In the testing code for either python or c++ the scales are always randomly initialized. Is there a function that generates the scales needed for accurate dequantization with real weights?

@mgoin
Copy link
Author

mgoin commented May 3, 2024

@Summer-Summer any help here would be appreciated please

@Summer-Summer
Copy link
Member

Sorry for the inconvenience. The generation of quantization scales is part of the model quantization process, and I believe that you can find the related code here.

I will add that API to this repo when I have more spare time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants