-
Notifications
You must be signed in to change notification settings - Fork 522
[Executorch][llama] Enable quantized sdpa #9945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Executorch][llama] Enable quantized sdpa #9945
Conversation
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9945
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit ad54e4e with merge base ad6f5ee ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) ghstack-source-id: 276640303 Pull Request resolved: #9945
This pull request was exported from Phabricator. Differential Revision: D71833064 |
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) [ghstack-poisoned]
Pull Request resolved: #9945 Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) ghstack-source-id: 276903837
This pull request was exported from Phabricator. Differential Revision: D71833064 |
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) [ghstack-poisoned]
Pull Request resolved: #9945 Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. ghstack-source-id: 276951801 Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)
This pull request was exported from Phabricator. Differential Revision: D71833064 |
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) [ghstack-poisoned]
Pull Request resolved: #9945 Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. ghstack-source-id: 276961554 Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)
This pull request was exported from Phabricator. Differential Revision: D71833064 |
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) [ghstack-poisoned]
Pull Request resolved: #9945 Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. ghstack-source-id: 277160634 Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)
This pull request was exported from Phabricator. Differential Revision: D71833064 |
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/) [ghstack-poisoned]
Pull Request resolved: #9945 Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. ghstack-source-id: 277233485 Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)
This pull request was exported from Phabricator. Differential Revision: D71833064 |
46fe905
into
gh/kimishpatel/172/base
Pull Request resolved: #9945 Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option. ghstack-source-id: 277233485 Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)
Stack from ghstack (oldest at bottom):
Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.
Differential Revision: D71833064