[ascend]feat: support kv int8 #103

yao-fengchen · 2024-11-11T07:03:55Z

No description provided.

CyCle1024 · 2024-11-13T06:18:26Z

dlinfer/framework/lmdeploy_ext/quants/ascend_kv.py

建议做成python -m dlinfer.framework.lmdeploy_ext.quants.ascend_kv --model_path xxx etc. 加上参数可以直接执行。。。

CyCle1024 · 2024-11-13T06:22:47Z

dlinfer/ops/llm.py

+    "dlinfer::fill_kv_cache",
+    ["key_cache", "value_cache"],
+    default_value={
+        "quant_bits": 0,


是否可以添加默认值：
"k_scales_zeros": tuple()
"v_scales_zeros": tuple()

CyCle1024 · 2024-11-13T06:25:08Z

dlinfer/ops/llm.py

@@ -205,6 +220,9 @@ def paged_decode_attention(
    softmax_scale: Optional[float],
    alibi_slopes: Optional[Sequence[float]],
    attn_output: Optional[Tensor],
+    kv_scales: Optional[Tensor],
+    kv_zeros: Optional[Tensor],


这里 kv_zeros 的类型为啥和 fill_kv_cache 中的 k_scales_zeros，v_scales_zeros 类型不一致？

fill_kv_cache中是为了避免slice

CyCle1024 · 2024-11-13T06:26:47Z

dlinfer/ops/llm.py

+    k_scales_zeros: Sequence[Optional[Tensor]],
+    v_scales_zeros: Sequence[Optional[Tensor]],
+    quant_bits: int,
+) -> Tuple[Tensor, Tensor, Tensor, Tensor]:


更新一下注释参数说明，其他算子同

yao-fengchen requested a review from CyCle1024 November 11, 2024 07:04

yao-fengchen force-pushed the ascend_kv_int8 branch 8 times, most recently from 0908362 to 1c73e98 Compare November 12, 2024 11:45

yao-fengchen added 4 commits November 13, 2024 03:15

[ascend]feat: support kv int8 quant

fd5db58

update doc

d00aed3

format code

5eb0894

update code

8bbec89

yao-fengchen force-pushed the ascend_kv_int8 branch from e266508 to 3608fd7 Compare November 13, 2024 03:18

yao-fengchen requested a review from tangzhiyi11 November 13, 2024 03:23

yao-fengchen force-pushed the ascend_kv_int8 branch from 3608fd7 to 83b567c Compare November 13, 2024 05:46

yao-fengchen requested a review from Reinerzhou November 13, 2024 05:46

yao-fengchen marked this pull request as ready for review November 13, 2024 06:05

yao-fengchen requested a review from jinminxi104 as a code owner November 13, 2024 06:05

yao-fengchen added 2 commits November 13, 2024 06:13

update params

82a8a01

test ascend_kv_int8

b8fc1b3

yao-fengchen force-pushed the ascend_kv_int8 branch from 83b567c to b8fc1b3 Compare November 13, 2024 06:16

CyCle1024 reviewed Nov 13, 2024

View reviewed changes

update docs

c200c6e

CyCle1024 added the ascend platform ascend label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ascend]feat: support kv int8 #103

[ascend]feat: support kv int8 #103

yao-fengchen commented Nov 11, 2024

CyCle1024 Nov 13, 2024

yao-fengchen Nov 13, 2024

CyCle1024 Nov 13, 2024

yao-fengchen Nov 13, 2024

CyCle1024 Nov 13, 2024

yao-fengchen Nov 13, 2024

CyCle1024 Nov 13, 2024

yao-fengchen Nov 13, 2024

[ascend]feat: support kv int8 #103

Are you sure you want to change the base?

[ascend]feat: support kv int8 #103

Conversation

yao-fengchen commented Nov 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment