Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra new lines for json generation #212

Closed
Samoed opened this issue Feb 20, 2025 · 2 comments
Closed

Extra new lines for json generation #212

Samoed opened this issue Feb 20, 2025 · 2 comments

Comments

@Samoed
Copy link

Samoed commented Feb 20, 2025

Hi! I'm running vllm:0.7.2. When I tried to generate JSON, it was producing a lot of new lines. With Qwen2.5-72b, more than 1,000 tokens were new lines, and for 7b, the response was:

{
    "clarifying_question": "Do you mean that you need to quickly deploy or start something (like an app, service, etc.) for $10?",
    "cost_per_serving": "$10",
    "calories": "",
    "type_dish_ids": ""
    ,
    "type_meal_ids": ""
    ,
    "product_ids": [
        "quick_launch_service"
    ],
    "exclude_product_ids": [
        
    
    ""],
    "allergen_ids": [
        
    ""]
    
    , "total_cooking_time": "",

    "kitchen_ids": "",

    "holiday_ids": ""

}

The JSON schema was generated using a Pydantic model:

from datetime import datetime
from langchain_openai import ChatOpenAI
from pydantic import BaseModel


class ResponseSchema(BaseModel):
    clarifying_question: str
    cost_per_serving: str
    calories: str
    type_dish_ids: str
    type_meal_ids: str
    product_ids: list[str]
    exclude_product_ids: list[str]
    allergen_ids: list[str]
    total_cooking_time: str
    kitchen_ids: str
    holiday_ids: str


def main():
    model = ChatOpenAI(
        model="qwen2.5-7b",
        base_url="http://localhost:8000/v1/",
        openai_api_key="test",
        temperature=0,
        extra_body={
            "repetition_penalty": 1.3,
            "presence_penalty": -1.1,
            "frequency_penalty": 0,
            "max_tokens": 2_000,
            "guided_json": ResponseSchema.model_json_schema(),
        },
    )

    query = "I want a quick launch fast with $10."
    RECIPE_PROMPT = [{"role": "user", "content": query}]
    
    print(datetime.now())
    structured_output = model.invoke(RECIPE_PROMPT)
    print(structured_output.content)
    print(ResponseSchema.model_validate_json(structured_output.content).model_dump())


if __name__ == "__main__":
    main()
@russellb
Copy link

This PR to vLLM should resolve this issue: vllm-project/vllm#12744

@Samoed
Copy link
Author

Samoed commented Feb 20, 2025

I see that was fixed in #123. I'll close for now. Thanks for quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants