[Feature] W4A8-FP8 support in AWQ quantization #2766

yongchaoding · 2024-11-18T02:31:46Z

Motivation

as we all know that lmdelopy runs fastest in awq w4a16, however, as fp8 is used in lots of place. so i wonder, if developers has any plan to develop a fastest w4a8-fp8 kernel in lmdeploy?

Related resources

No response

Additional context

No response

dingjingzhen · 2024-11-19T09:12:04Z

+1

lzhangzz · 2024-11-20T08:54:41Z

I will start the work on W8A8 after my current work is done. W4A8 should come after W8A8.

lvhan028 assigned lzhangzz Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] W4A8-FP8 support in AWQ quantization #2766

[Feature] W4A8-FP8 support in AWQ quantization #2766

yongchaoding commented Nov 18, 2024

dingjingzhen commented Nov 19, 2024

lzhangzz commented Nov 20, 2024

[Feature] W4A8-FP8 support in AWQ quantization #2766

[Feature] W4A8-FP8 support in AWQ quantization #2766

Comments

yongchaoding commented Nov 18, 2024

Motivation

Related resources

Additional context

dingjingzhen commented Nov 19, 2024

lzhangzz commented Nov 20, 2024