Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stateful models by default #486

Closed
wants to merge 43 commits into from
Closed

Conversation

slyalin
Copy link
Contributor

@slyalin slyalin commented Dec 7, 2023

What does this PR do?

Support to convert and run stateful models.
Similar to #493, but makes stateful models by default in optimum-cli and from_pretrained.

TODO

  • Command line interface update
  • Testing on a wider range of topologies

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@AlexKoff88
Copy link
Collaborator

Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412

@slyalin
Copy link
Contributor Author

slyalin commented Dec 11, 2023

Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412

@AlexKoff88, is it going to just cover more model architectures to be stateful without providing extra value to those architectures that have already been enabled in this PR? Can we do that in a separate PR? So we will introduce stateful capabilities gradually providing necessary API changes in this first PR?

@AlexKoff88
Copy link
Collaborator

Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412

@AlexKoff88, is it going to just cover more model architectures to be stateful without providing extra value to those architectures that have already been enabled in this PR? Can we do that in a separate PR? So we will introduce stateful capabilities gradually providing necessary API changes in this first PR?

I think we can do it in a separate PR when reworking Seq2seq models and getting rid of two decoder sub-models.

@slyalin slyalin marked this pull request as ready for review December 14, 2023 11:39
@@ -364,6 +382,13 @@ def ts_patched_forward(*args, **kwargs):
inp_tensor.get_node().set_partial_shape(static_shape)
inp_tensor.get_node().set_element_type(get_element_type(inp_data.cpu().numpy().dtype))
ov_model.validate_nodes_and_infer_types()

if stateful:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check task before applying patch? for making this transformation, the model should have with_past in task type and if I right understand current changes are targeted only for text-generation-with-past, there is also seq2seq models with decoder with past as @AlexKoff88 already mentioned in comment, where this transformation should be applied on only one from 3 models

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is implemented now, it is a user responsibility to set stateful=True only for right models. To set it by default there should be more adjustments in the code, which I cannot provide in this PR. Part of the models will not be supported. I am just not so familiar with the code base to provide necessary changes to enable by default. We need someone who can do that.

from optimum.intel.utils.import_utils import is_openvino_version


def model_has_name(ov_model: ov.Model, name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def model_has_name(ov_model: ov.Model, name: str):
def model_has_input_output_name(ov_model: ov.Model, name: str):

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed in #493



def model_has_cache_reorder(ov_model):
return model_has_input(ov_model, 'beam_idx')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name should be beam_id or beam_ids if you count batch to stay aligned with input_ids

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. It's reasonable to state that idx is singular form of the word ids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't super creative when choosing this name, and just borrowed it from _reorder_cache method argument name. So this is aligned with that part of the code that this thing comes from. And this idx is not the same as those id or ids they are indexes for a different thing.

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition @slyalin !!

Comment on lines 30 to 39
def pathed_generate_dummy_inputs(self, *args, **kwargs):
dummy_inputs = self._original_generate_dummy_inputs(*args, **kwargs)
if 'input_ids' in dummy_inputs and dummy_inputs['input_ids'].shape[1] == 1:
dummy_inputs['input_ids'] = torch.cat([dummy_inputs['input_ids'], dummy_inputs['input_ids']], dim=-1)
attention_mask = dummy_inputs['attention_mask']
dummy_inputs['attention_mask'] = torch.cat([attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1)
return dummy_inputs

model_config._original_generate_dummy_inputs = model_config.generate_dummy_inputs
model_config.generate_dummy_inputs = types.MethodType(pathed_generate_dummy_inputs, model_config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't we use input_shapes instead ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it can be used now. When this code was first written it was applied externally as a patch for the model. Now it is most likely can be implemented directly in export function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will not work, because there is additional logic that modify sequence_len in OnnxConfigWithPast after original shapes specification
https://github.com/huggingface/optimum/blob/main/optimum/exporters/onnx/base.py#L663

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starting with optimum 1.14.0 this behaviour triggered only by legacy path, but we do not use it, so it can be removed at all

Comment on lines 388 to 389
model.key_value_input_names = [name for name in input_names if name.startswith('past_key_values.')]
model.key_value_output_names = [name for name in output_names if name.startswith('present.')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be done in patch_stateful directly ? (without modifying the original model attributes, but by checking the model graph inputs or by also providing the input_names / output_names

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewritten in #493

return model_has_input(ov_model, 'beam_idx')


def model_has_state(ov_model):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def model_has_state(ov_model):
def _model_has_state(ov_model):

Would add an underscore to most of the added functions to hint that it's not intended for external use (to give us more freedom in the future)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use this method externally, so that is why naming without underscore is preferable for us.

TO DO inside means that possibly the realization changed in future, but as part of API, this function is very useful

Comment on lines 80 to 87
def make_stateful(
ov_model: ov.Model,
not_kv_inputs,
key_value_input_names,
key_value_output_names,
batch_dim,
num_attention_heads,
num_beams_and_batch=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add type hinting ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in #493

state.reset()
# Set initial value for the next beam_idx input that will be used at the current iteration
# and will be optionally updated by _reorder_cache at the next iterations if beam_search is used
self.next_beam_idx = np.array(range(batch_size), dtype=int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer to had it defined in __init__ directly (can be set to None)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please explain? I do not understand your suggestion

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to have all attributes defined in the __init__, which is not the case here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in #493



def raise_if_openvino_is_too_old():
if is_openvino_version("<=", "2023.2"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That prohibits https://storage.openvinotoolkit.org/repositories/openvino/packages/nightly/2023.3.0-13432-a6ea22ad0e6/l_openvino_toolkit_ubuntu20_2023.3.0.dev20231129_x86_64.tgz usage. I get ValueError: Could not create or use stateful model when using old version of openvino==2023.3.0-13432-a6ea22ad0e6. Install openvino>=2023.3.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I remember the previous version of this condition was different and didn't work with nightly. I have changed it and tested with nightly -- everything worked. Not sure why it doesn't work now -- probably I messed up with the versions when checked on my side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eaidova reproduced this issue when both openvino and openvino-nightly installed. @Wovchena please uninstall the packages and install openvino-nightly only.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we need to fix it to work with different OpenVINO installations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issues with detection version using python libs from archive. The problem looks like happens in case if you have installed openvino from multiple sources in the same time (e.g. pypi openvino + pypi openvino-nightly or pypi openvino + PYTHONPATH to openvino libs) )and their versions are different

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my PR with fixes for this branch,
#493
I changed logic for version verification. previously used approach checked only packages that exists in site-packages of python environment (only installed as wheels), but does not take into account that user (in development case for example) install openvino as archive or build from source without installing wheels.

Now, version openvino should be aligned with import order

@slyalin
Copy link
Contributor Author

slyalin commented Jan 4, 2024

@echarlaix, I cannot push changes to #493, so I've continued development in my original PR with all changes from that PR. Could you approve workflows to allow checks running here?

Comment on lines 315 to 320
shared_memory: bool = False,
share_inputs: bool = False,
*,
shared_memory: Any = None,
):
data_cache.append(inputs)
self.request.infer(inputs, shared_memory)
self.request.infer(inputs, share_inputs, share_outputs=True, shared_memory=shared_memory)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexKoff88, this is how I have fixed it. Please provide feedback if something looks weird from your perspective.

@slyalin slyalin changed the title Stateful model support Stateful model by default Jan 5, 2024
@slyalin slyalin changed the title Stateful model by default Stateful models by default Jan 5, 2024
@slyalin
Copy link
Contributor Author

slyalin commented Jan 12, 2024

This PR is completely superseded by #493.

@slyalin slyalin closed this Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants