Stateful models by default #486

slyalin · 2023-12-07T11:54:05Z

What does this PR do?

Support to convert and run stateful models.
Similar to #493, but makes stateful models by default in optimum-cli and from_pretrained.

TODO

Command line interface update
Testing on a wider range of topologies

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-12-07T12:04:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlexKoff88 · 2023-12-08T09:57:23Z

Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412

slyalin · 2023-12-11T10:13:27Z

Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412

@AlexKoff88, is it going to just cover more model architectures to be stateful without providing extra value to those architectures that have already been enabled in this PR? Can we do that in a separate PR? So we will introduce stateful capabilities gradually providing necessary API changes in this first PR?

AlexKoff88 · 2023-12-11T10:51:55Z

Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412

@AlexKoff88, is it going to just cover more model architectures to be stateful without providing extra value to those architectures that have already been enabled in this PR? Can we do that in a separate PR? So we will introduce stateful capabilities gradually providing necessary API changes in this first PR?

I think we can do it in a separate PR when reworking Seq2seq models and getting rid of two decoder sub-models.

optimum/intel/openvino/modeling_decoder.py

…is not supported

eaidova · 2023-12-15T05:12:01Z

optimum/exporters/openvino/convert.py

@@ -364,6 +382,13 @@ def ts_patched_forward(*args, **kwargs):
            inp_tensor.get_node().set_partial_shape(static_shape)
            inp_tensor.get_node().set_element_type(get_element_type(inp_data.cpu().numpy().dtype))
        ov_model.validate_nodes_and_infer_types()
+
+        if stateful:


should we also check task before applying patch? for making this transformation, the model should have with_past in task type and if I right understand current changes are targeted only for text-generation-with-past, there is also seq2seq models with decoder with past as @AlexKoff88 already mentioned in comment, where this transformation should be applied on only one from 3 models

As it is implemented now, it is a user responsibility to set stateful=True only for right models. To set it by default there should be more adjustments in the code, which I cannot provide in this PR. Part of the models will not be supported. I am just not so familiar with the code base to provide necessary changes to enable by default. We need someone who can do that.

eaidova · 2023-12-15T05:13:32Z

optimum/exporters/openvino/stateful.py

+from optimum.intel.utils.import_utils import is_openvino_version
+
+
+def model_has_name(ov_model: ov.Model, name: str):


Suggested change

def model_has_name(ov_model: ov.Model, name: str):

def model_has_input_output_name(ov_model: ov.Model, name: str):

renamed in #493

Wovchena · 2023-12-15T10:39:59Z

optimum/exporters/openvino/stateful.py

+
+
+def model_has_cache_reorder(ov_model):
+    return model_has_input(ov_model, 'beam_idx')


The name should be beam_id or beam_ids if you count batch to stay aligned with input_ids

Never mind. It's reasonable to state that idx is singular form of the word ids

I wasn't super creative when choosing this name, and just borrowed it from _reorder_cache method argument name. So this is aligned with that part of the code that this thing comes from. And this idx is not the same as those id or ids they are indexes for a different thing.

echarlaix

Thanks for the addition @slyalin !!

echarlaix · 2023-12-15T16:16:10Z

optimum/exporters/openvino/better_transformer_patch.py

+    def pathed_generate_dummy_inputs(self, *args, **kwargs):
+        dummy_inputs = self._original_generate_dummy_inputs(*args, **kwargs)
+        if 'input_ids' in dummy_inputs and dummy_inputs['input_ids'].shape[1] == 1:
+            dummy_inputs['input_ids'] = torch.cat([dummy_inputs['input_ids'], dummy_inputs['input_ids']], dim=-1)
+            attention_mask = dummy_inputs['attention_mask']
+            dummy_inputs['attention_mask'] = torch.cat([attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1)
+        return dummy_inputs
+
+    model_config._original_generate_dummy_inputs = model_config.generate_dummy_inputs
+    model_config.generate_dummy_inputs = types.MethodType(pathed_generate_dummy_inputs, model_config)


couldn't we use input_shapes instead ?

Probably it can be used now. When this code was first written it was applied externally as a patch for the model. Now it is most likely can be implemented directly in export function.

that will not work, because there is additional logic that modify sequence_len in OnnxConfigWithPast after original shapes specification
https://github.com/huggingface/optimum/blob/main/optimum/exporters/onnx/base.py#L663

starting with optimum 1.14.0 this behaviour triggered only by legacy path, but we do not use it, so it can be removed at all

echarlaix · 2023-12-15T16:20:37Z

optimum/exporters/openvino/convert.py

+            model.key_value_input_names = [name for name in input_names if name.startswith('past_key_values.')]
+            model.key_value_output_names = [name for name in output_names if name.startswith('present.')]


could this be done in patch_stateful directly ? (without modifying the original model attributes, but by checking the model graph inputs or by also providing the input_names / output_names

rewritten in #493

echarlaix · 2023-12-15T16:44:01Z

optimum/exporters/openvino/stateful.py

+    return model_has_input(ov_model, 'beam_idx')
+
+
+def model_has_state(ov_model):


Suggested change

def model_has_state(ov_model):

def _model_has_state(ov_model):

Would add an underscore to most of the added functions to hint that it's not intended for external use (to give us more freedom in the future)

We want to use this method externally, so that is why naming without underscore is preferable for us.

TO DO inside means that possibly the realization changed in future, but as part of API, this function is very useful

echarlaix · 2023-12-15T16:44:58Z

optimum/exporters/openvino/stateful.py

+def make_stateful(
+        ov_model: ov.Model,
+        not_kv_inputs,
+        key_value_input_names,
+        key_value_output_names,
+        batch_dim,
+        num_attention_heads,
+        num_beams_and_batch=None):


could you add type hinting ?

added in #493

echarlaix · 2023-12-15T17:13:05Z

optimum/intel/openvino/modeling_decoder.py

+                    state.reset()
+                # Set initial value for the next beam_idx input that will be used at the current iteration
+                # and will be optionally updated by _reorder_cache at the next iterations if beam_search is used
+                self.next_beam_idx = np.array(range(batch_size), dtype=int)


would prefer to had it defined in __init__ directly (can be set to None)

could you please explain? I do not understand your suggestion

I would expect to have all attributes defined in the __init__, which is not the case here

added in #493

Wovchena · 2023-12-16T16:24:30Z

optimum/exporters/openvino/stateful.py

+
+
+def raise_if_openvino_is_too_old():
+    if is_openvino_version("<=", "2023.2"):


That prohibits https://storage.openvinotoolkit.org/repositories/openvino/packages/nightly/2023.3.0-13432-a6ea22ad0e6/l_openvino_toolkit_ubuntu20_2023.3.0.dev20231129_x86_64.tgz usage. I get ValueError: Could not create or use stateful model when using old version of openvino==2023.3.0-13432-a6ea22ad0e6. Install openvino>=2023.3.0.

As I remember the previous version of this condition was different and didn't work with nightly. I have changed it and tested with nightly -- everything worked. Not sure why it doesn't work now -- probably I messed up with the versions when checked on my side.

@eaidova reproduced this issue when both openvino and openvino-nightly installed. @Wovchena please uninstall the packages and install openvino-nightly only.

I suppose we need to fix it to work with different OpenVINO installations.

I have no issues with detection version using python libs from archive. The problem looks like happens in case if you have installed openvino from multiple sources in the same time (e.g. pypi openvino + pypi openvino-nightly or pypi openvino + PYTHONPATH to openvino libs) )and their versions are different

in my PR with fixes for this branch,
#493
I changed logic for version verification. previously used approach checked only packages that exists in site-packages of python environment (only installed as wheels), but does not take into account that user (in development case for example) install openvino as archive or build from source without installing wheels.

Now, version openvino should be aligned with import order

…is not supported

…antizer tests.

slyalin · 2024-01-04T16:45:21Z

@echarlaix, I cannot push changes to #493, so I've continued development in my original PR with all changes from that PR. Could you approve workflows to allow checks running here?

slyalin · 2024-01-05T08:01:38Z

optimum/intel/openvino/quantization.py

-                shared_memory: bool = False,
+                share_inputs: bool = False,
+                *,
+                shared_memory: Any = None,
            ):
                data_cache.append(inputs)
-                self.request.infer(inputs, shared_memory)
+                self.request.infer(inputs, share_inputs, share_outputs=True, shared_memory=shared_memory)


@AlexKoff88, this is how I have fixed it. Please provide feedback if something looks weird from your perspective.

slyalin · 2024-01-12T11:00:10Z

This PR is completely superseded by #493.

slyalin added 3 commits December 5, 2023 09:32

Allow loading of stateful models (no patching yet)

d41110c

Stateful models support

bcb7cac

Fix forward for chatglm

d4c165b

slyalin mentioned this pull request Dec 7, 2023

Move stateful model preparation to optimum-intel openvinotoolkit/openvino.genai#52

Merged

AlexKoff88 requested a review from echarlaix December 7, 2023 11:58

slyalin added 2 commits December 8, 2023 11:53

Passing stateful as a dedicated parameter

a62dc6f

Fixed possibly misaligned types in ShapeOf Concat sub-expression

403adb5

slyalin added 3 commits December 12, 2023 08:27

Merge remote-tracking branch 'origin/main' into stateful

6097bfd

Fixed critical typo in infer_request invocation

c54a466

Merge remote-tracking branch 'origin/main' into stateful

6151bec

usstq reviewed Dec 13, 2023

View reviewed changes

optimum/intel/openvino/modeling_decoder.py Outdated Show resolved Hide resolved

slyalin added 6 commits December 13, 2023 12:34

Apply bettertransfomer when model is converted in stateful mode

0e5aefc

Correct default value handling for stateful flag

5087f92

Apply bettertransformer under try-except to avoid crashes when model …

5ae6d5c

…is not supported

Added --stateful option in optimum-cli

ddb182e

Merge remote-tracking branch 'origin/main' into stateful

8e9a7e0

Raise if too old version of opevino is used ans stateful=True

a51ab27

slyalin marked this pull request as ready for review December 14, 2023 11:39

Fix openvino version check to be compatible with openvino-nightly

6df798a

eaidova reviewed Dec 15, 2023

View reviewed changes

Wovchena reviewed Dec 15, 2023

View reviewed changes

echarlaix reviewed Dec 15, 2023

View reviewed changes

Wovchena reviewed Dec 16, 2023

View reviewed changes

slyalin added 2 commits December 19, 2023 09:33

Merged from recent main branch

fa33784

Fix for bloom family

aa299e6

slyalin and others added 21 commits December 28, 2023 19:14

Fix forward for chatglm

fafc040

Passing stateful as a dedicated parameter

e4585fe

Fixed possibly misaligned types in ShapeOf Concat sub-expression

a5c48c4

Fixed critical typo in infer_request invocation

3ce4fe9

Apply bettertransfomer when model is converted in stateful mode

00bd0b3

Correct default value handling for stateful flag

489799a

Apply bettertransformer under try-except to avoid crashes when model …

312f24f

…is not supported

Added --stateful option in optimum-cli

4bca339

Raise if too old version of opevino is used ans stateful=True

a5f8558

Fix openvino version check to be compatible with openvino-nightly

2763639

Fix for bloom family

f5da152

Fix general code style and appliy renaming suggestions

da62d99

fix version checking if openvino not in site-packages

33f86dc

use reset_stateif available

7d15415

remove input patch in bettertransformer apply

0dfbb63

add tests

09bbe7b

add type hints and update doc strings

e70a50d

added more tests

a57c86a

Merge remote-tracking branch 'origin/main' into ea/stateful

67deae2

Fixed outdated signature of InferRequest wrapper to fix one of the qu…

89ba0cb

…antizer tests.

Merge remote-tracking branch 'slyalin/stateful' into stateful

a85960a

slyalin commented Jan 5, 2024

View reviewed changes

slyalin added 2 commits January 5, 2024 09:01

Merge remote-tracking branch 'origin/main' into stateful

dcfcc2a

Switch to stateful model by default

df0e727

slyalin changed the title ~~Stateful model support~~ Stateful model by default Jan 5, 2024

Merge remote-tracking branch 'origin/main' into stateful

9992419

slyalin changed the title ~~Stateful model by default~~ Stateful models by default Jan 5, 2024

slyalin closed this Jan 12, 2024

		from optimum.intel.utils.import_utils import is_openvino_version


		def model_has_name(ov_model: ov.Model, name: str):

	def model_has_name(ov_model: ov.Model, name: str):
	def model_has_input_output_name(ov_model: ov.Model, name: str):



		def model_has_cache_reorder(ov_model):
		return model_has_input(ov_model, 'beam_idx')

		model.key_value_input_names = [name for name in input_names if name.startswith('past_key_values.')]
		model.key_value_output_names = [name for name in output_names if name.startswith('present.')]

		return model_has_input(ov_model, 'beam_idx')


		def model_has_state(ov_model):

	def model_has_state(ov_model):
	def _model_has_state(ov_model):



		def raise_if_openvino_is_too_old():
		if is_openvino_version("<=", "2023.2"):

Stateful models by default #486

Stateful models by default #486

Conversation

slyalin commented Dec 7, 2023 • edited Loading

What does this PR do?

TODO

Before submitting

HuggingFaceDocBuilderDev commented Dec 7, 2023

AlexKoff88 commented Dec 8, 2023

slyalin commented Dec 11, 2023

AlexKoff88 commented Dec 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

echarlaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slyalin commented Jan 4, 2024

Choose a reason for hiding this comment

slyalin commented Jan 12, 2024

slyalin commented Dec 7, 2023 •

edited

Loading