Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Badcase]: qwen2-72B在H800 fp8量化下,输出json会有概率多一个转义符 #1132

Open
4 tasks done
alanshao2 opened this issue Dec 17, 2024 · 1 comment
Open
4 tasks done

Comments

@alanshao2
Copy link

alanshao2 commented Dec 17, 2024

Model Series

Qwen2

What are the models used?

Qwen2-72B-Instruct

What is the scenario where the problem happened?

tensorrt-llm

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

GPU: Nvidia H800

Description

Steps to reproduce

This happens to Qwen2-72B-Instruct. fp8量化.
The badcase can be reproduced with the following steps:
输出以json格式输出

  1. ...
  2. ...

The following example input & output can be used:

system: ...
user: ...
...

Expected results

json格式输出时候会多\转义符,比如\n,变成了\n

Attempts to fix

I have tried several ways to fix this, including:

  1. adjusting the sampling parameters, but ...
  2. prompt engineering, but ...

Anything else helpful for investigation

I find that this problem also happens to ...

@jklj077
Copy link
Collaborator

jklj077 commented Dec 17, 2024

Please follow the template so that we can understand your issue more clearly.

  1. Please provide the full case so that we can try to reproduce.
  2. Please try using guided decoding if you would like perfectly valid JSON generation.
  3. Does this happen to Qwen2.5?
  4. Does it happen for the bfloat16 precision? fp8 may severely impact the performance. we advise you to use GPTQ or AWQ or other kinds of qunatization algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants