Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ascend]feat: support kv int8 #103

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

yao-fengchen
Copy link
Contributor

No description provided.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议做成python -m dlinfer.framework.lmdeploy_ext.quants.ascend_kv --model_path xxx etc. 加上参数可以直接执行。。。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新

"dlinfer::fill_kv_cache",
["key_cache", "value_cache"],
default_value={
"quant_bits": 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否可以添加默认值:
"k_scales_zeros": tuple()
"v_scales_zeros": tuple()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已添加

@@ -205,6 +220,9 @@ def paged_decode_attention(
softmax_scale: Optional[float],
alibi_slopes: Optional[Sequence[float]],
attn_output: Optional[Tensor],
kv_scales: Optional[Tensor],
kv_zeros: Optional[Tensor],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 kv_zeros 的类型为啥和 fill_kv_cache 中的 k_scales_zeros,v_scales_zeros 类型不一致?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fill_kv_cache中是为了避免slice

k_scales_zeros: Sequence[Optional[Tensor]],
v_scales_zeros: Sequence[Optional[Tensor]],
quant_bits: int,
) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

更新一下注释参数说明,其他算子同

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已更新

@CyCle1024 CyCle1024 added the ascend platform ascend label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ascend platform ascend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants