fix OVModelForCausalLM for auto device #433

eaidova · 2023-09-20T06:28:30Z

What does this PR do?

Fix loading OVModelForCausalLM if AUTO specified as device (AUTO device does not have property INFERENCE_PRECISION_HINT among supported)

eaidova · 2023-09-20T06:34:20Z

@AlexKoff88 @helena-intel @echarlaix could you please take a look?

HuggingFaceDocBuilderDev · 2023-09-20T06:45:03Z

The documentation is not available anymore as the PR was closed or merged.

helena-intel · 2023-09-20T10:39:13Z

Thanks @eaidova ! It is a shame it is not possible to query this for AUTO device. It is possible to do that once the model has been loaded on the "final" device:

device = compiled_model.get_property("EXECUTION_DEVICES")[0]
compiled_model.get_property("DEVICE_PROPERTIES")[device]["INFERENCE_PRECISION_HINT"]

but we can't predict when the model will have been loaded to that device. I really wish we could find a way to get the optimal PKV precision with AUTO too though, even if it's a bit hacky for now. Should we for now have a warning about potentially slower performance on some devices when using AUTO with LLMs?

And should we have a separate method to allow users to set this PKV precision manually without having to set ov_config's INFERENCE_PRECISION_HINT?

echarlaix

thanks for the fix @eaidova !

eaidova · 2023-09-21T04:30:08Z

Thanks @eaidova ! It is a shame it is not possible to query this for AUTO device. It is possible to do that once the model has been loaded on the "final" device:
device = compiled_model.get_property("EXECUTION_DEVICES")[0]
compiled_model.get_property("DEVICE_PROPERTIES")[device]["INFERENCE_PRECISION_HINT"]
but we can't predict when the model will have been loaded to that device. I really wish we could find a way to get the optimal PKV precision with AUTO too though, even if it's a bit hacky for now. Should we for now have a warning about potentially slower performance on some devices when using AUTO with LLMs?

And should we have a separate method to allow users to set this PKV precision manually without having to set ov_config's INFERENCE_PRECISION_HINT?

@helena-intel let's discuss it internally what we can suggest for users who are interested in using auto (probably we also need some advice from @peterchen-intel here). My opinion, we can postpone applying of PKV transformation until target device is unknown and move it on compilation step (only if AUTO selected, because reloading model several times maybe time consuming for some devices). Also I need to note that default device for model class now is CPU, AUTO used only if user explicitly specified it, so impact on users probably not so big.

Current solution is also nonuniversal, because it does not take into account that model precision and device can be changed in runtime, now it applicable only at the moment when model initialized, so probably move this logic inside compile will be better place (as recompilation will be triggered also by half() and to() methods)

eaidova force-pushed the ea/decoder_on_auto branch from 30a2685 to 4985d1c Compare September 20, 2023 06:33

fix OVModelForCausalLM for auto device

4985d1c

AlexKoff88 approved these changes Sep 20, 2023

View reviewed changes

eaidova mentioned this pull request Sep 20, 2023

240-dolly-2-instruction-following fails to use AUTO as device openvinotoolkit/openvino_notebooks#1302

Closed

helena-intel self-requested a review September 20, 2023 10:39

helena-intel approved these changes Sep 20, 2023

View reviewed changes

peterchen-intel approved these changes Sep 20, 2023

View reviewed changes

echarlaix approved these changes Sep 20, 2023

View reviewed changes

echarlaix merged commit 99f6008 into huggingface:main Sep 21, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix OVModelForCausalLM for auto device #433

fix OVModelForCausalLM for auto device #433

eaidova commented Sep 20, 2023

eaidova commented Sep 20, 2023

HuggingFaceDocBuilderDev commented Sep 20, 2023 •

edited

Loading

helena-intel commented Sep 20, 2023

echarlaix left a comment

eaidova commented Sep 21, 2023

fix OVModelForCausalLM for auto device #433

fix OVModelForCausalLM for auto device #433

Conversation

eaidova commented Sep 20, 2023

What does this PR do?

eaidova commented Sep 20, 2023

HuggingFaceDocBuilderDev commented Sep 20, 2023 • edited Loading

helena-intel commented Sep 20, 2023

echarlaix left a comment

Choose a reason for hiding this comment

eaidova commented Sep 21, 2023

HuggingFaceDocBuilderDev commented Sep 20, 2023 •

edited

Loading