Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get_usable_length for StaticCache #32503

Closed
2 of 4 tasks
Tracked by #32253
guangy10 opened this issue Aug 7, 2024 · 3 comments
Closed
2 of 4 tasks
Tracked by #32253

Fix get_usable_length for StaticCache #32503

guangy10 opened this issue Aug 7, 2024 · 3 comments

Comments

@guangy10
Copy link
Contributor

guangy10 commented Aug 7, 2024

System Info

transformers: 4.45.0.dev0
torch: 2.5.0.dev20240716+cpu

Who can help?

@ArthurZucker
@gante
@zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

It's a bug only affect the new workflow "Export to ExecuTorch" that we're trying to enable. The method get_usable_length should be override for StaticCache where recompile, resizing or evicting the existing cache entry won't be applicable. This is because unlike eager or torch.compiled model, exported model will be running in a non-python env where recompiling from the eager source isn't available.

Expected behavior

The generation length using the exported artifact should not exceed the maximal cache length because the model and the size of the its cache are exported statically. When get_usable_length returns 0, it should terminate the generation.

@ArthurZucker
Copy link
Collaborator

The get_usable_length is something that we are deprecating anyways, but good to note. StaticCache should not even have it!

@guangy10
Copy link
Contributor Author

guangy10 commented Aug 8, 2024

@ArthurZucker Because @helunwencser in our team is working on Phi-3-mini, and in Phi-3's modeling code, get_usable_length is used in several places like here or here. The modeling code itself is fine because doesn't make assumption about the type of cache being used. However, when comes to trace the model via torch.export, the default get_usable_length will trigger evicting old cache entries hence new attentions from that point will be incorrect. I think it would make sense to override it for StaticCache to avoid misbehavior in the short-term before it can be fully deprecated.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants