-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stateful models by default #486
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Besides the first two items described in TODO we should revise Seq2seq models as well and make decoding part stateful as well: https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L412 |
@AlexKoff88, is it going to just cover more model architectures to be stateful without providing extra value to those architectures that have already been enabled in this PR? Can we do that in a separate PR? So we will introduce stateful capabilities gradually providing necessary API changes in this first PR? |
I think we can do it in a separate PR when reworking Seq2seq models and getting rid of two decoder sub-models. |
@@ -364,6 +382,13 @@ def ts_patched_forward(*args, **kwargs): | |||
inp_tensor.get_node().set_partial_shape(static_shape) | |||
inp_tensor.get_node().set_element_type(get_element_type(inp_data.cpu().numpy().dtype)) | |||
ov_model.validate_nodes_and_infer_types() | |||
|
|||
if stateful: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also check task before applying patch? for making this transformation, the model should have with_past in task type and if I right understand current changes are targeted only for text-generation-with-past, there is also seq2seq models with decoder with past as @AlexKoff88 already mentioned in comment, where this transformation should be applied on only one from 3 models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is implemented now, it is a user responsibility to set stateful=True only for right models. To set it by default there should be more adjustments in the code, which I cannot provide in this PR. Part of the models will not be supported. I am just not so familiar with the code base to provide necessary changes to enable by default. We need someone who can do that.
from optimum.intel.utils.import_utils import is_openvino_version | ||
|
||
|
||
def model_has_name(ov_model: ov.Model, name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def model_has_name(ov_model: ov.Model, name: str): | |
def model_has_input_output_name(ov_model: ov.Model, name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed in #493
|
||
|
||
def model_has_cache_reorder(ov_model): | ||
return model_has_input(ov_model, 'beam_idx') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name should be beam_id
or beam_ids
if you count batch to stay aligned with input_ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. It's reasonable to state that idx
is singular form of the word ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't super creative when choosing this name, and just borrowed it from _reorder_cache
method argument name. So this is aligned with that part of the code that this thing comes from. And this idx
is not the same as those id
or ids
they are indexes for a different thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the addition @slyalin !!
def pathed_generate_dummy_inputs(self, *args, **kwargs): | ||
dummy_inputs = self._original_generate_dummy_inputs(*args, **kwargs) | ||
if 'input_ids' in dummy_inputs and dummy_inputs['input_ids'].shape[1] == 1: | ||
dummy_inputs['input_ids'] = torch.cat([dummy_inputs['input_ids'], dummy_inputs['input_ids']], dim=-1) | ||
attention_mask = dummy_inputs['attention_mask'] | ||
dummy_inputs['attention_mask'] = torch.cat([attention_mask, attention_mask.new_ones((attention_mask.shape[0], 1))], dim=-1) | ||
return dummy_inputs | ||
|
||
model_config._original_generate_dummy_inputs = model_config.generate_dummy_inputs | ||
model_config.generate_dummy_inputs = types.MethodType(pathed_generate_dummy_inputs, model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't we use input_shapes
instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it can be used now. When this code was first written it was applied externally as a patch for the model. Now it is most likely can be implemented directly in export function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will not work, because there is additional logic that modify sequence_len in OnnxConfigWithPast after original shapes specification
https://github.com/huggingface/optimum/blob/main/optimum/exporters/onnx/base.py#L663
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
starting with optimum 1.14.0 this behaviour triggered only by legacy path, but we do not use it, so it can be removed at all
model.key_value_input_names = [name for name in input_names if name.startswith('past_key_values.')] | ||
model.key_value_output_names = [name for name in output_names if name.startswith('present.')] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be done in patch_stateful
directly ? (without modifying the original model attributes, but by checking the model graph inputs or by also providing the input_names
/ output_names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewritten in #493
return model_has_input(ov_model, 'beam_idx') | ||
|
||
|
||
def model_has_state(ov_model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def model_has_state(ov_model): | |
def _model_has_state(ov_model): |
Would add an underscore to most of the added functions to hint that it's not intended for external use (to give us more freedom in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to use this method externally, so that is why naming without underscore is preferable for us.
TO DO inside means that possibly the realization changed in future, but as part of API, this function is very useful
def make_stateful( | ||
ov_model: ov.Model, | ||
not_kv_inputs, | ||
key_value_input_names, | ||
key_value_output_names, | ||
batch_dim, | ||
num_attention_heads, | ||
num_beams_and_batch=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add type hinting ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in #493
state.reset() | ||
# Set initial value for the next beam_idx input that will be used at the current iteration | ||
# and will be optionally updated by _reorder_cache at the next iterations if beam_search is used | ||
self.next_beam_idx = np.array(range(batch_size), dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer to had it defined in __init__
directly (can be set to None
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please explain? I do not understand your suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect to have all attributes defined in the __init__
, which is not the case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in #493
|
||
|
||
def raise_if_openvino_is_too_old(): | ||
if is_openvino_version("<=", "2023.2"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That prohibits https://storage.openvinotoolkit.org/repositories/openvino/packages/nightly/2023.3.0-13432-a6ea22ad0e6/l_openvino_toolkit_ubuntu20_2023.3.0.dev20231129_x86_64.tgz usage. I get ValueError: Could not create or use stateful model when using old version of openvino==2023.3.0-13432-a6ea22ad0e6. Install openvino>=2023.3.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I remember the previous version of this condition was different and didn't work with nightly. I have changed it and tested with nightly -- everything worked. Not sure why it doesn't work now -- probably I messed up with the versions when checked on my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we need to fix it to work with different OpenVINO installations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issues with detection version using python libs from archive. The problem looks like happens in case if you have installed openvino from multiple sources in the same time (e.g. pypi openvino + pypi openvino-nightly or pypi openvino + PYTHONPATH to openvino libs) )and their versions are different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my PR with fixes for this branch,
#493
I changed logic for version verification. previously used approach checked only packages that exists in site-packages of python environment (only installed as wheels), but does not take into account that user (in development case for example) install openvino as archive or build from source without installing wheels.
Now, version openvino should be aligned with import order
@echarlaix, I cannot push changes to #493, so I've continued development in my original PR with all changes from that PR. Could you approve workflows to allow checks running here? |
shared_memory: bool = False, | ||
share_inputs: bool = False, | ||
*, | ||
shared_memory: Any = None, | ||
): | ||
data_cache.append(inputs) | ||
self.request.infer(inputs, shared_memory) | ||
self.request.infer(inputs, share_inputs, share_outputs=True, shared_memory=shared_memory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexKoff88, this is how I have fixed it. Please provide feedback if something looks weird from your perspective.
This PR is completely superseded by #493. |
What does this PR do?
Support to convert and run stateful models.
Similar to #493, but makes stateful models by default in optimum-cli and from_pretrained.
TODO
Before submitting