[Badcase]: qwen2-72B在H800 fp8量化下，输出json会有概率多一个转义符 #1132

alanshao2 · 2024-12-17T05:26:54Z

Qwen2

Qwen2-72B-Instruct

tensorrt-llm

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find a solution there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

GPU: Nvidia H800

This happens to Qwen2-72B-Instruct. fp8量化.
The badcase can be reproduced with the following steps:
输出以json格式输出

The following example input & output can be used:

system: ...
user: ...
...

json格式输出时候会多\转义符，比如\n,变成了\n

I have tried several ways to fix this, including:

I find that this problem also happens to ...

The text was updated successfully, but these errors were encountered:

jklj077 · 2024-12-17T09:23:35Z

Please follow the template so that we can understand your issue more clearly.

Please provide the full case so that we can try to reproduce.
Please try using guided decoding if you would like perfectly valid JSON generation.
Does this happen to Qwen2.5?
Does it happen for the bfloat16 precision? fp8 may severely impact the performance. we advise you to use GPTQ or AWQ or other kinds of qunatization algorithms.

Provide feedback