You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
It's a bug only affect the new workflow "Export to ExecuTorch" that we're trying to enable. The method get_usable_length should be override for StaticCache where recompile, resizing or evicting the existing cache entry won't be applicable. This is because unlike eager or torch.compiled model, exported model will be running in a non-python env where recompiling from the eager source isn't available.
Expected behavior
The generation length using the exported artifact should not exceed the maximal cache length because the model and the size of the its cache are exported statically. When get_usable_length returns 0, it should terminate the generation.
The text was updated successfully, but these errors were encountered:
@ArthurZucker Because @helunwencser in our team is working on Phi-3-mini, and in Phi-3's modeling code, get_usable_length is used in several places like here or here. The modeling code itself is fine because doesn't make assumption about the type of cache being used. However, when comes to trace the model via torch.export, the default get_usable_length will trigger evicting old cache entries hence new attentions from that point will be incorrect. I think it would make sense to override it for StaticCache to avoid misbehavior in the short-term before it can be fully deprecated.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers: 4.45.0.dev0
torch: 2.5.0.dev20240716+cpu
Who can help?
@ArthurZucker
@gante
@zucchini-nlp
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
It's a bug only affect the new workflow "Export to ExecuTorch" that we're trying to enable. The method
get_usable_length
should be override forStaticCache
where recompile, resizing or evicting the existing cache entry won't be applicable. This is because unlike eager ortorch.compile
d model, exported model will be running in a non-python env where recompiling from the eager source isn't available.Expected behavior
The
generation
length using the exported artifact should not exceed the maximal cache length because the model and the size of the its cache are exported statically. Whenget_usable_length
returns 0, it should terminate the generation.The text was updated successfully, but these errors were encountered: