-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Backend option to disable xgrammar any_whitespace #12744
[Bugfix] Backend option to disable xgrammar any_whitespace #12744
Conversation
Signed-off-by: Wallas Santos <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason we shouldn't just make any_whitespace=False
the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.
Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer? |
Thank you @russellb and @mgoin for the quick feedback. Let's discuss.
Good point, from the vllm perspective, IMO set
Thought about that too, but I am not sure if this is an issue only for mistral. My intention with this variable is to have an option that can fix/restore the issue that we found in our environment without the risk of getting more regression with other models or scenarios. This might be a solution for similar cases and I guess we could see if the community report related bugs before we set the default behavior. I guess we should consider that for the V1, which AFAIK will have xgrammar as default guided decoding backend and it is not yet implemented there. |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read over this again and I like the proposed change. We're stuck with possible bad model behavior either way. With this change, we allow xgrammar to dictate the default behavior, but allow users to override it if necessary. I think that's the best we can do.
Thank you @russellb! I'm glad that you agree! 🙏 |
Signed-off-by: Joe Runde <[email protected]>
@wallashss take a look at the discussion on #13505. I'd be interested in your feedback there. |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
It'd be extra nice to have a little blurb about this in the docs somewhere, so people running mistral models know to turn it on. Maybe an example under Otherwise LGTM! |
FYI, I had similar issues with qwen2.5 models mlc-ai/xgrammar#212, so this is not mistral specific |
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
@joerunde I added a log info (once) for mistral and qwen models. What do you think? |
@wallashss Nice, LGTM! I bet there are probably other models out there that will run into this issue, we can always update the log or documentation as needed once we get more feedback. |
Oh, maybe a unit test here would be nice to make sure it actually works 😉 |
Signed-off-by: Wallas Santos <[email protected]>
Done! Thanks to @Samoed I got a nice example to add to tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks! I left a minor suggestion to clarify a comment in the test, but it's not a big deal.
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
This is super helpful @wallashss. @mgoin @aarnphm is there a timeline for getting this merged? |
Thanks for the feedback @sethkimmel3, I'm trying to keep the CI green (due to flaky tests) and waiting someone with write permission to merge 🙏 |
This is also creating major issues with llama models; I think it ought to be a high priority solve. cc: @simon-mo |
…ect#12744) Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Johnny <[email protected]>
…ect#12744) Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]>
…ect#12744) Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Linkun Chen <[email protected]>
Mistral models + guided decoding with json schema with xgrammar is generating endless whitespace. This bug was introduced with this change on xgrammar. My proposal is to add an environment variable that can disable whitespace on guided decoding with json scheme. Therefore, serving models like mistral will behave fine like before the change of xgrammar.
Minimal script to repro
Output (truncated by the max tokens parameter)