Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the repo provide a quantization kernel? #10

Closed
yatorho opened this issue Jul 22, 2024 · 4 comments
Closed

Does the repo provide a quantization kernel? #10

yatorho opened this issue Jul 22, 2024 · 4 comments

Comments

@yatorho
Copy link

yatorho commented Jul 22, 2024

It seems that the fp6_llm repo only includes the kernel weight_matrix_dequant_fp_eXmY_cpu, which dequantizes fp6 data to fp16 format, but it lacks the kernel to quantize fp16 data to fp6. Could you provide a kernel for quantizing pre-trained models?

@gau-nernst
Copy link

gau-nernst commented Jul 22, 2024

I have integrated the FP6 kernel from this repo in torchao with user-friendly API to quantize and run a given model. You can check it out here. https://github.com/pytorch/ao/tree/main/torchao/prototype/quant_llm

The quantization logic is adapted from DeepSpeed as mentioned in #6

@yatorho
Copy link
Author

yatorho commented Jul 22, 2024

Thanks for the reply. It seems that Torchao has not yet merged this API into the current release. I built it from the source, and it worked for me.
Another small question: Torchao only adopted fp6_llm's code for the linear_forward function. Other operations, such as packing and repacking kernels, are all implemented with tensor-level in Python instead of directly using fp6_llm's cpp code?

@gau-nernst
Copy link

Yes, packing is done in Python using PyTorch ops. With this approach, we can support CUDA tensors. We also skip unnecessary 6-bit packing, and directly pack to 2+4bit layout as used by FP6-LLM.

@yatorho
Copy link
Author

yatorho commented Jul 22, 2024

Thank you again! It solved my problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants