-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fp16 and int8 to OpenVINO models and export CLI #443
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Ella!
Thanks @echarlaix ! I tested it and it worked well. I'm wondering if we should restrict INT8 weights compression to tasks where we are reasonably confident that it works well (LLMs), and show a message about manually quantizing for other tasks. |
model_kwargs=model_kwargs, | ||
) | ||
del models_and_onnx_configs | ||
|
||
if int8: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment from my side. Such an implementation means that we will:
- Convert from PyTorch to OpenVINO with memory reuse (no-copy)
- However, after that, we will serialize to the disk, load the model again, quantize weights, remove files from the disk, and store the compressed version, finally.
- It can be time-consuming for really large models.
Ideally, we should have nncf.compress_weights() right after openvino.convert_model(), the similar to FP16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened a follow-up PR #444 for the default 8-bit compression. I tried to follow the approach described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eaidova, please take a look as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we should have nncf.compress_weights() right after openvino.convert_model(), the similar to FP16.
Yes that's a good point, was going to modify this PR but I see that you opened #444, thanks @AlexKoff88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
as discussed in #437