hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations #709

RafLit · 2025-01-20T11:04:08Z

0f8cafe change from upstream modified the Attention vllm layer to use unified_attention which uses self.layer_name for layer device dispatch. This causes recompilations as each layer has a different name.

… torch.compile recompilations

anko-intel · 2025-01-24T15:13:53Z

vllm/attention/layer.py

+        if self.use_direct_call:
+            return self.impl.forward(query, key, value, kv_cache,
+                                     attn_metadata, self._k_scale,
+                                     self._v_scale)


Could we add only those lines and left the rest unattached?

[SW-216341] Revert vllm/attention/layer.py changes from 0f8cafe - fix…

daafd10

… torch.compile recompilations

RafLit requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners January 20, 2025 11:04

Merge branch 'habana_main' into dev/rlitka/revert_0f8cafe_layer

695c40f

RafLit requested review from anko-intel and Kacper-Pietkun January 20, 2025 12:37

RafLit added 2 commits January 23, 2025 10:07

Merge branch 'habana_main' into dev/rlitka/revert_0f8cafe_layer

64c3e6b

Merge branch 'habana_main' into dev/rlitka/revert_0f8cafe_layer

01eebd2

anko-intel approved these changes Jan 23, 2025

View reviewed changes

This was referenced Jan 23, 2025

Adopt PyTorch dynamo cache size to current layer definition #733

Closed

Adopt dynamo cache size to current layer definition #737

Open

anko-intel changed the title ~~[SW-216341] hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations~~ hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations Jan 24, 2025

anko-intel reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations #709

hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations #709

RafLit commented Jan 20, 2025 •

edited by github-actions bot

Loading

anko-intel Jan 24, 2025

hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations #709

Are you sure you want to change the base?

hotfix - Revert vllm/attention/layer.py changes from 0f8cafe - fix torch.compile recompilations #709

Conversation

RafLit commented Jan 20, 2025 • edited by github-actions bot Loading

anko-intel Jan 24, 2025

Choose a reason for hiding this comment

RafLit commented Jan 20, 2025 •

edited by github-actions bot

Loading