Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fp16 and int8 to OpenVINO models and export CLI #443

Merged
merged 16 commits into from
Oct 4, 2023

Conversation

echarlaix
Copy link
Collaborator

@echarlaix echarlaix commented Sep 29, 2023

as discussed in #437

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 29, 2023

The documentation is not available anymore as the PR was closed or merged.

@echarlaix echarlaix marked this pull request as ready for review September 29, 2023 15:29
Copy link
Collaborator

@AlexKoff88 AlexKoff88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Ella!

@helena-intel
Copy link
Collaborator

Thanks @echarlaix ! I tested it and it worked well. I'm wondering if we should restrict INT8 weights compression to tasks where we are reasonably confident that it works well (LLMs), and show a message about manually quantizing for other tasks.

model_kwargs=model_kwargs,
)
del models_and_onnx_configs

if int8:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment from my side. Such an implementation means that we will:

  • Convert from PyTorch to OpenVINO with memory reuse (no-copy)
  • However, after that, we will serialize to the disk, load the model again, quantize weights, remove files from the disk, and store the compressed version, finally.
  • It can be time-consuming for really large models.

Ideally, we should have nncf.compress_weights() right after openvino.convert_model(), the similar to FP16.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a follow-up PR #444 for the default 8-bit compression. I tried to follow the approach described above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eaidova, please take a look as well.

Copy link
Collaborator Author

@echarlaix echarlaix Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should have nncf.compress_weights() right after openvino.convert_model(), the similar to FP16.

Yes that's a good point, was going to modify this PR but I see that you opened #444, thanks @AlexKoff88

@echarlaix echarlaix requested a review from AlexKoff88 October 4, 2023 10:34
@echarlaix echarlaix changed the title Add fp16 and int8 to OpenVINO export CLI Add fp16 and int8 to OpenVINO models and export CLI Oct 4, 2023
Copy link
Collaborator

@AlexKoff88 AlexKoff88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@echarlaix echarlaix merged commit ce6c6bc into main Oct 4, 2023
10 of 12 checks passed
@echarlaix echarlaix deleted the add-weight-only-cli branch October 4, 2023 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants