[Executorch][llama] Enable quantized sdpa #9945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

facebook-github-bot merged 6 commits into gh/kimishpatel/172/base from gh/kimishpatel/172/head

Apr 10, 2025

Contributor

kimishpatel commented Apr 7, 2025 •

edited

Loading

Stack from ghstack (oldest at bottom):

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: D71833064


          [Executorch][llama] Enable quantized sdpa

c2b6878

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

[ghstack-poisoned]

kimishpatel requested review from jackzhxng, iseeyuan, larryliu0820, swolchok and lucylq as code owners

April 7, 2025 22:02

kimishpatel mentioned this pull request

[Executorch][SDPA] Refactor + Make quantized sdpa handle sequence at dim 1 or 2 #9943

Merged

pytorch-bot bot commented Apr 7, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9945

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ad54e4e with merge base ad6f5ee ():

NEW FAILURE - The following job has failed:

pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t 164e827c9b176c5c0e74d7e01ce224603eb02a49aa8d9861619bbabe05f152f8 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

kimishpatel mentioned this pull request

[Executorch][llama] Renamed quantized_kv_cache to custom_kv_cache #9944

Merged

kimishpatel added a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

929f8f8

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

ghstack-source-id: 276640303
Pull Request resolved: #9945

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Apr 7, 2025

This pull request was exported from Phabricator. Differential Revision: D71833064

facebook-github-bot added the fb-exported label

kimishpatel added the release notes: examples label


          Update on "[Executorch][llama] Enable quantized sdpa"

f744daf

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

[ghstack-poisoned]

kimishpatel requested a review from GregoryComer as a code owner

April 8, 2025 21:58

kimishpatel added a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

0b1f2e0

Pull Request resolved: #9945

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)
ghstack-source-id: 276903837

Contributor

facebook-github-bot commented Apr 8, 2025

This pull request was exported from Phabricator. Differential Revision: D71833064


          Update on "[Executorch][llama] Enable quantized sdpa"

ebfb137

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

[ghstack-poisoned]

kimishpatel added a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

431faa5

Pull Request resolved: #9945

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.
ghstack-source-id: 276951801

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

Contributor

facebook-github-bot commented Apr 9, 2025

This pull request was exported from Phabricator. Differential Revision: D71833064


          Update on "[Executorch][llama] Enable quantized sdpa"

05bcc8a

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

[ghstack-poisoned]

kimishpatel added a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

ee41748

Pull Request resolved: #9945

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.
ghstack-source-id: 276961554

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

Contributor

facebook-github-bot commented Apr 9, 2025

This pull request was exported from Phabricator. Differential Revision: D71833064

kirklandsign approved these changes

View reviewed changes


          Update on "[Executorch][llama] Enable quantized sdpa"

6c4cbe4

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

[ghstack-poisoned]

kimishpatel added a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

ad60adb

Pull Request resolved: #9945

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.
ghstack-source-id: 277160634

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

Contributor

facebook-github-bot commented Apr 9, 2025

This pull request was exported from Phabricator. Differential Revision: D71833064


          Update on "[Executorch][llama] Enable quantized sdpa"

ad54e4e

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

[ghstack-poisoned]

kimishpatel added a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

c458541

Pull Request resolved: #9945

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.
ghstack-source-id: 277233485

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

Contributor

facebook-github-bot commented Apr 10, 2025

This pull request was exported from Phabricator. Differential Revision: D71833064

facebook-github-bot merged commit 46fe905 into gh/kimishpatel/172/base

87 of 90 checks passed

facebook-github-bot deleted the gh/kimishpatel/172/head branch

April 10, 2025 14:25

facebook-github-bot temporarily deployed to cherry-pick-bot

April 10, 2025 14:25

— with

GitHub Actions Inactive

pytorchbot mentioned this pull request

[Executorch][llama] Enable quantized sdpa #10062

Merged

kirklandsign pushed a commit that referenced this pull request


          [Executorch][llama] Enable quantized sdpa

637c5a8

Pull Request resolved: #9945

Enable leveraging quantized sdpa op when quantized kv cache is used. Instead of adding yet another arg, at the moment I have chosen to leverage quantize_kv_cache option.
ghstack-source-id: 277233485

Differential Revision: [D71833064](https://our.internmc.facebook.com/intern/diff/D71833064/)

github-actions bot mentioned this pull request

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#28

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

kirklandsign kirklandsign approved these changes

jackzhxng Awaiting requested review from jackzhxng jackzhxng is a code owner

iseeyuan Awaiting requested review from iseeyuan iseeyuan is a code owner

larryliu0820 Awaiting requested review from larryliu0820 larryliu0820 is a code owner

swolchok Awaiting requested review from swolchok swolchok is a code owner

lucylq Awaiting requested review from lucylq lucylq is a code owner

GregoryComer Awaiting requested review from GregoryComer GregoryComer is a code owner

Labels

CLA Signed fb-exported release notes: examples