-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate weight-only quantizaion of INC #417
Conversation
@echarlaix Could you please help review this PR? INC supports production-level quality of weight-only quantization including INT8 and INT4 for LLMs in latest master (also be released in INC v2.3 in early Sep). Thanks. |
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the addition @mengniwang95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the addition @mengniwang95 !
Hi Ella, currently UT fails since it doesn't use the latest master code. Do we need to wait neural-compressor 2.3 release for UT test and merge this PR after all test passing? @echarlaix |
For when is the |
Hi Ella, neural-compressor release is planned on 9/15. I add INT4 UT in this branch, but it is not triggered due to neural-compressor < 2.3 |
@echarlaix it seems some tests failed, while they may not be related with the changes. Could you please help check, or is it okay to get this PR merged? |
Could you update your branch by rebasing from main ? This will fix all unrelated tests. The INC tests are failing, because the release is previewed for tomorrow I think we should install |
Signed-off-by: Mengni Wang <[email protected]>
Signed-off-by: Mengni Wang <[email protected]>
Signed-off-by: Mengni Wang <[email protected]>
This PR integrate weight-only quantization of neural compressor into optimum-intel.
Notice: Need to use the master branch for test