[WIP][AMD][Kernel][Quantization] Add fp8 and int8 support for Triton FAv2 kernel #12534

rasmith · 2025-01-29T00:05:15Z

This is a work in progress at the moment to add fp8 and int8 support for FAv2 kernel.

Signed-off-by: Randall Smith <[email protected]>

github-actions · 2025-01-29T00:05:28Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2025-01-29T00:05:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rasmith.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

shahedy2276541 · 2025-01-29T00:08:54Z

vllm/attention/backends/flash_attn.py

@@ -227,6 +227,7 @@ def prefill_metadata(self) -> Optional["FlashAttentionMetadata"]:
            slot_mapping=slot_mapping,


Signed-off-by: Randall Smith <[email protected]>

rasmith added 7 commits January 21, 2025 13:01

TritonScaledMMLinearKernel implementation

9e8bad6

Signed-off-by: Randall Smith <[email protected]>

Add regression test for rocm w8a8

daf9a71

Signed-off-by: Randall Smith <[email protected]>

remote unused import

9c11d5c

Signed-off-by: Randall Smith <[email protected]>

ruff

4e4d633

Signed-off-by: Randall Smith <[email protected]>

adding rocm fp8 support

16cfc92

Signed-off-by: Randall Smith <[email protected]>

first successful execution

5e0ada7

Signed-off-by: Randall Smith <[email protected]>

remove prints

481ea60

Signed-off-by: Randall Smith <[email protected]>

rasmith requested review from tlrmchlsmth, WoosukKwon, DarkLight1337, ywang96, mgoin, robertgshaw2-redhat, zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners January 29, 2025 00:05

mergify bot added documentation Improvements or additions to documentation needs-rebase labels Jan 29, 2025

shahedy2276541 approved these changes Jan 29, 2025

View reviewed changes

rasmith marked this pull request as draft January 29, 2025 00:08

shahedy2276541 approved these changes Jan 29, 2025

View reviewed changes

Merge upstream

2de123f

Signed-off-by: Randall Smith <[email protected]>

mergify bot removed the needs-rebase label Jan 29, 2025

rasmith added 3 commits January 29, 2025 00:51

typo fix

0369a6d

Signed-off-by: Randall Smith <[email protected]>

typo fix

4ff10e4

Signed-off-by: Randall Smith <[email protected]>

typo fix

b5f045d

Signed-off-by: Randall Smith <[email protected]>

rasmith added 4 commits January 29, 2025 00:54

typo fix

58ea057

Signed-off-by: Randall Smith <[email protected]>

typo fix

3ac3f08

Signed-off-by: Randall Smith <[email protected]>

fix merge

00cae8d

Signed-off-by: Randall Smith <[email protected]>

fix compiler error

a322dc6

Signed-off-by: Randall Smith <[email protected]>

hongxiayang added the rocm label Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][AMD][Kernel][Quantization] Add fp8 and int8 support for Triton FAv2 kernel #12534

[WIP][AMD][Kernel][Quantization] Add fp8 and int8 support for Triton FAv2 kernel #12534

rasmith commented Jan 29, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 29, 2025

mergify bot commented Jan 29, 2025

shahedy2276541 Jan 29, 2025

		@@ -227,6 +227,7 @@ def prefill_metadata(self) -> Optional["FlashAttentionMetadata"]:
		slot_mapping=slot_mapping,

[WIP][AMD][Kernel][Quantization] Add fp8 and int8 support for Triton FAv2 kernel #12534

Are you sure you want to change the base?

[WIP][AMD][Kernel][Quantization] Add fp8 and int8 support for Triton FAv2 kernel #12534

Conversation

rasmith commented Jan 29, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 29, 2025

mergify bot commented Jan 29, 2025

shahedy2276541 Jan 29, 2025

Choose a reason for hiding this comment

rasmith commented Jan 29, 2025 •

edited by github-actions bot

Loading