Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix OVModelForCausalLM for auto device #433

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

eaidova
Copy link
Collaborator

@eaidova eaidova commented Sep 20, 2023

What does this PR do?

Fix loading OVModelForCausalLM if AUTO specified as device (AUTO device does not have property INFERENCE_PRECISION_HINT among supported)

@eaidova
Copy link
Collaborator Author

eaidova commented Sep 20, 2023

@AlexKoff88 @helena-intel @echarlaix could you please take a look?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 20, 2023

The documentation is not available anymore as the PR was closed or merged.

@helena-intel
Copy link
Collaborator

Thanks @eaidova ! It is a shame it is not possible to query this for AUTO device. It is possible to do that once the model has been loaded on the "final" device:

device = compiled_model.get_property("EXECUTION_DEVICES")[0]
compiled_model.get_property("DEVICE_PROPERTIES")[device]["INFERENCE_PRECISION_HINT"]

but we can't predict when the model will have been loaded to that device. I really wish we could find a way to get the optimal PKV precision with AUTO too though, even if it's a bit hacky for now. Should we for now have a warning about potentially slower performance on some devices when using AUTO with LLMs?

And should we have a separate method to allow users to set this PKV precision manually without having to set ov_config's INFERENCE_PRECISION_HINT?

@helena-intel helena-intel self-requested a review September 20, 2023 10:39
Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix @eaidova !

@eaidova
Copy link
Collaborator Author

eaidova commented Sep 21, 2023

Thanks @eaidova ! It is a shame it is not possible to query this for AUTO device. It is possible to do that once the model has been loaded on the "final" device:

device = compiled_model.get_property("EXECUTION_DEVICES")[0]
compiled_model.get_property("DEVICE_PROPERTIES")[device]["INFERENCE_PRECISION_HINT"]

but we can't predict when the model will have been loaded to that device. I really wish we could find a way to get the optimal PKV precision with AUTO too though, even if it's a bit hacky for now. Should we for now have a warning about potentially slower performance on some devices when using AUTO with LLMs?

And should we have a separate method to allow users to set this PKV precision manually without having to set ov_config's INFERENCE_PRECISION_HINT?

@helena-intel let's discuss it internally what we can suggest for users who are interested in using auto (probably we also need some advice from @peterchen-intel here). My opinion, we can postpone applying of PKV transformation until target device is unknown and move it on compilation step (only if AUTO selected, because reloading model several times maybe time consuming for some devices). Also I need to note that default device for model class now is CPU, AUTO used only if user explicitly specified it, so impact on users probably not so big.

Current solution is also nonuniversal, because it does not take into account that model precision and device can be changed in runtime, now it applicable only at the moment when model initialized, so probably move this logic inside compile will be better place (as recompilation will be triggered also by half() and to() methods)

@echarlaix echarlaix merged commit 99f6008 into huggingface:main Sep 21, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants