[Bugfix] Backend option to disable xgrammar any_whitespace #12744

wallashss · 2025-02-04T18:18:11Z

Mistral models + guided decoding with json schema with xgrammar is generating endless whitespace. This bug was introduced with this change on xgrammar. My proposal is to add an environment variable that can disable whitespace on guided decoding with json scheme. Therefore, serving models like mistral will behave fine like before the change of xgrammar.

Minimal script to repro

from vllm import LLM, SamplingParams
from vllm.model_executor.guided_decoding.guided_fields import GuidedDecodingRequest

import json

# guided_decoding_backend='lm-format-enforcer'
# guided_decoding_backend='outlines'
guided_decoding_backend='xgrammar'
# Initialize the LLM

llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.3", enforce_eager=True, guided_decoding_backend=guided_decoding_backend, tokenizer_mode='mistral'
          )
    
input = {"prompt_token_ids": [1, 6, 1501, 7567, 1891, 2032, 1113, 3396, 1316, 1113, 3396, 2032, 10598, 1629, 2032, 1113, 1295, 29498, 3790, 29498, 1537, 1991, 1316, 1113, 7286, 2032, 1113, 2226, 1040, 2636, 8854, 1065, 1032, 2846, 5491, 1316, 1113, 12206, 2032, 10598, 1891, 2032, 1113, 3582, 1316, 1113, 11491, 2032, 10598, 3501, 2032, 10598, 1891, 2032, 1113, 2195, 1316, 1113, 7286, 2032, 1113, 1782, 3758, 1072, 2433, 29493, 1085, 29491, 29489, 29491, 4420, 10454, 29493, 10229, 8474, 1113, 6074, 2032, 10598, 1891, 2032, 1113, 2195, 1316, 1113, 10825, 2032, 8135, 29485, 1958, 3938, 1316, 1113, 29490, 19425, 13075, 3010, 11549, 1113, 11661, 2032, 8135, 3501, 3010, 1743, 10925, 7, 3, 2592, 1117, 1040, 8854, 1505, 1065, 9911, 3922, 29572, 4]}

params = SamplingParams( max_tokens=100)
guided_req = GuidedDecodingRequest(guided_json={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA'}, 'unit': {'type': 'string', 'enum': ['celsius', 'fahrenheit']}}, 'required': ['location']})

# Invoke the LLM with tool calling
outputs = llm.generate(input, sampling_params=params, guided_options_request=guided_req, use_tqdm=False)
output = outputs[0].outputs[0].text
print(json.dumps(output) )

Output (truncated by the max tokens parameter)

"{\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n"

Notes: With outlines it generates correctly. lm-fomat-enforcer as well but with couple of spaces.

Signed-off-by: Wallas Santos <[email protected]>

github-actions · 2025-02-04T18:18:22Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

russellb

Is there a good reason we shouldn't just make any_whitespace=False the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.

mgoin · 2025-02-04T18:40:16Z

Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer?

wallashss · 2025-02-04T19:07:59Z

Thank you @russellb and @mgoin for the quick feedback. Let's discuss.

Is there a good reason we shouldn't just make any_whitespace=False the default? It seems like that should be fine for all cases and will ensure this problem doesn't happen without having to find the knob.

Good point, from the vllm perspective, IMO set any_whitespace=False by default is more backward compatible, since this change on xrammar comes with a regression that we found in our tests. But accordingly to this [xgrammar PR] (mlc-ai/xgrammar#123) some models can have issues without that too, and vllm upgraded xgrammar with this new behavior. I thought it would make sense to avoid this switch of behaviors too much.

Is this something you could control with just the knowledge of whether we are using an HF or Mistral tokenizer?

Thought about that too, but I am not sure if this is an issue only for mistral.

My intention with this variable is to have an option that can fix/restore the issue that we found in our environment without the risk of getting more regression with other models or scenarios. This might be a solution for similar cases and I guess we could see if the community report related bugs before we set the default behavior. I guess we should consider that for the V1, which AFAIK will have xgrammar as default guided decoding backend and it is not yet implemented there.

mergify · 2025-02-05T06:47:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wallashss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Wallas Santos <[email protected]>

mergify · 2025-02-05T18:04:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wallashss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Wallas Santos <[email protected]>

mergify · 2025-02-12T17:06:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wallashss.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Wallas Santos <[email protected]>

russellb

I read over this again and I like the proposed change. We're stuck with possible bad model behavior either way. With this change, we allow xgrammar to dictate the default behavior, but allow users to override it if necessary. I think that's the best we can do.

wallashss · 2025-02-18T17:33:19Z

Thank you @russellb! I'm glad that you agree! 🙏

Signed-off-by: Joe Runde <[email protected]>

russellb · 2025-02-19T16:13:12Z

Thank you @russellb! I'm glad that you agree! 🙏

@wallashss take a look at the discussion on #13505. I'd be interested in your feedback there.

Signed-off-by: Joe Runde <[email protected]>

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-02-20T20:28:48Z

Now I thinks it's ready again.

@russellb @joerunde

vllm/engine/arg_utils.py

Signed-off-by: Wallas Santos <[email protected]>

joerunde · 2025-02-20T21:21:15Z

It'd be extra nice to have a little blurb about this in the docs somewhere, so people running mistral models know to turn it on. Maybe an example under examples/online_serving/openai_chat_completion_structured_outputs.py as well?

Otherwise LGTM!

Samoed · 2025-02-20T21:32:15Z

FYI, I had similar issues with qwen2.5 models mlc-ai/xgrammar#212, so this is not mistral specific

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-02-21T18:41:47Z

@joerunde I added a log info (once) for mistral and qwen models. What do you think?

joerunde · 2025-02-21T18:54:40Z

@wallashss Nice, LGTM!

I bet there are probably other models out there that will run into this issue, we can always update the log or documentation as needed once we get more feedback.

joerunde · 2025-02-21T18:55:52Z

Oh, maybe a unit test here would be nice to make sure it actually works 😉

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-02-21T23:25:04Z

Oh, maybe a unit test here would be nice to make sure it actually works 😉

Done! Thanks to @Samoed I got a nice example to add to tests.

russellb

lgtm, thanks! I left a minor suggestion to clarify a comment in the test, but it's not a big deal.

tests/entrypoints/llm/test_guided_generate.py

Signed-off-by: Wallas Santos <[email protected]>

sethkimmel3 · 2025-02-26T06:41:38Z

This is super helpful @wallashss. @mgoin @aarnphm is there a timeline for getting this merged?

wallashss · 2025-02-26T12:14:16Z

Thanks for the feedback @sethkimmel3, I'm trying to keep the CI green (due to flaky tests) and waiting someone with write permission to merge 🙏

sethkimmel3 · 2025-02-26T17:57:33Z

This is also creating major issues with llama models; I think it ought to be a high priority solve. cc: @simon-mo

…ect#12744) Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Johnny <[email protected]>

…ect#12744) Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]>

…ect#12744) Signed-off-by: Wallas Santos <[email protected]> Signed-off-by: Joe Runde <[email protected]> Co-authored-by: Joe Runde <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

[Bugfix] Env var to to disable xgrammar any_whitespace

2756335

Signed-off-by: Wallas Santos <[email protected]>

wallashss requested a review from mgoin as a code owner February 4, 2025 18:18

mergify bot added the structured-output label Feb 4, 2025

russellb requested changes Feb 4, 2025

View reviewed changes

mergify bot added the needs-rebase label Feb 5, 2025

Merge branch 'main' into fix-xgrammar-whitespace

f01cfc7

Signed-off-by: Wallas Santos <[email protected]>

mergify bot removed the needs-rebase label Feb 5, 2025

mergify bot added the needs-rebase label Feb 5, 2025

Merge branch 'main' into fix-xgrammar-whitespace

a24fdab

Signed-off-by: Wallas Santos <[email protected]>

mergify bot removed the needs-rebase label Feb 6, 2025

Merge branch 'main' into fix-xgrammar-whitespace

7a3927e

Signed-off-by: Wallas Santos <[email protected]>

mergify bot added the needs-rebase label Feb 12, 2025

Merge branch 'main' into fix-xgrammar-whitespace

669b097

Signed-off-by: Wallas Santos <[email protected]>

mergify bot removed the needs-rebase label Feb 12, 2025

joerunde added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 13, 2025

Merge branch 'main' into fix-xgrammar-whitespace

435695a

Signed-off-by: Wallas Santos <[email protected]>

russellb approved these changes Feb 18, 2025

View reviewed changes

🔧 Add env var to disable guided decoding fallbacks

65cff01

Signed-off-by: Joe Runde <[email protected]>

russellb mentioned this pull request Feb 19, 2025

[Frontend] Add backend-specific options for guided decoding #13505

Merged

joerunde added 3 commits February 19, 2025 14:29

⏪ revert envs change

c971da7

Signed-off-by: Joe Runde <[email protected]>

✨ add guided decoding backend options

15cac0c

Signed-off-by: Joe Runde <[email protected]>

🐛 handle missing backend name

f9d0e9d

Signed-off-by: Joe Runde <[email protected]>

Updated docs and removed code from env

a2aa6d3

Signed-off-by: Wallas Santos <[email protected]>

wallashss marked this pull request as ready for review February 20, 2025 20:28

mgoin reviewed Feb 20, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

rewrite disable_any_whitespace in args to disable-any-whitespace

44aeb64

Signed-off-by: Wallas Santos <[email protected]>

wallashss added 2 commits February 21, 2025 15:32

added info to tell users how to use disable-any-whitespace

602c75f

Signed-off-by: Wallas Santos <[email protected]>

Merge branch 'main' into fix-xgrammar-whitespace

2e3eb0c

Signed-off-by: Wallas Santos <[email protected]>

added test for disable any whitespace

3bfbd2d

Signed-off-by: Wallas Santos <[email protected]>

wallashss requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners February 21, 2025 23:23

wallashss changed the title ~~[Bugfix] Backend options to disable xgrammar any_whitespace~~ [Bugfix] Backend option to disable xgrammar any_whitespace Feb 22, 2025

russellb approved these changes Feb 23, 2025

View reviewed changes

tests/entrypoints/llm/test_guided_generate.py Outdated Show resolved Hide resolved

wallashss added 4 commits February 24, 2025 11:05

minor test change

7a10476

Signed-off-by: Wallas Santos <[email protected]>

Merge branch 'main' into fix-xgrammar-whitespace

5f55f81

Signed-off-by: Wallas Santos <[email protected]>

fix pre-commit

c04a031

Signed-off-by: Wallas Santos <[email protected]>

Merge branch 'main' into fix-xgrammar-whitespace

75bc73c

Signed-off-by: Wallas Santos <[email protected]>

simon-mo merged commit 4cb6fa0 into vllm-project:main Feb 26, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Backend option to disable xgrammar any_whitespace #12744

[Bugfix] Backend option to disable xgrammar any_whitespace #12744

wallashss commented Feb 4, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 4, 2025

russellb left a comment

mgoin commented Feb 4, 2025

wallashss commented Feb 4, 2025

mergify bot commented Feb 5, 2025

mergify bot commented Feb 5, 2025

mergify bot commented Feb 12, 2025

russellb left a comment

wallashss commented Feb 18, 2025

russellb commented Feb 19, 2025

wallashss commented Feb 20, 2025

joerunde commented Feb 20, 2025

Samoed commented Feb 20, 2025

wallashss commented Feb 21, 2025

joerunde commented Feb 21, 2025

joerunde commented Feb 21, 2025

wallashss commented Feb 21, 2025

russellb left a comment

sethkimmel3 commented Feb 26, 2025 •

edited

Loading

wallashss commented Feb 26, 2025

sethkimmel3 commented Feb 26, 2025

[Bugfix] Backend option to disable xgrammar any_whitespace #12744

[Bugfix] Backend option to disable xgrammar any_whitespace #12744

Conversation

wallashss commented Feb 4, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 4, 2025

russellb left a comment

Choose a reason for hiding this comment

mgoin commented Feb 4, 2025

wallashss commented Feb 4, 2025

mergify bot commented Feb 5, 2025

mergify bot commented Feb 5, 2025

mergify bot commented Feb 12, 2025

russellb left a comment

Choose a reason for hiding this comment

wallashss commented Feb 18, 2025

russellb commented Feb 19, 2025

wallashss commented Feb 20, 2025

joerunde commented Feb 20, 2025

Samoed commented Feb 20, 2025

wallashss commented Feb 21, 2025

joerunde commented Feb 21, 2025

joerunde commented Feb 21, 2025

wallashss commented Feb 21, 2025

russellb left a comment

Choose a reason for hiding this comment

sethkimmel3 commented Feb 26, 2025 • edited Loading

wallashss commented Feb 26, 2025

sethkimmel3 commented Feb 26, 2025

wallashss commented Feb 4, 2025 •

edited by github-actions bot

Loading

sethkimmel3 commented Feb 26, 2025 •

edited

Loading