Skip to content

Commit

Permalink
Revert "do not use softmax fast mode in FusedSDPA (#26)"
Browse files Browse the repository at this point in the history
This reverts commit 4e911b4.
  • Loading branch information
madamczykhabana authored Nov 14, 2024
1 parent 4e911b4 commit 3b928de
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion vllm_hpu_extension/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def prompt_attention(
if query_heads != kv_heads:
key = repeat_kv(key, int(query_heads // kv_heads))
value = repeat_kv(value, int(query_heads // kv_heads))
softmax_mode = 'None'
softmax_mode = 'fast'
recompute_mode = True
attn_weights = FusedSDPA.apply(query, key, value, None, 0.0, True,
scale, softmax_mode, recompute_mode,
Expand Down

0 comments on commit 3b928de

Please sign in to comment.