Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions #17162

Merged
merged 8 commits into from
Feb 13, 2024

Conversation

L-cloud
Copy link
Contributor

@L-cloud L-cloud commented Feb 7, 2024

  • Description :

Fix: Use shallow copy for schema manipulation in get_format_instructions

Prevents side effects on the original schema object by using a dictionary comprehension for a safer and more controlled manipulation of schema key-value pairs, enhancing code reliability.

@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 7, 2024
Copy link

vercel bot commented Feb 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 13, 2024 6:40am

@dosubot dosubot bot added Ɑ: parsing Related to output parser module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 7, 2024
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment about why this is neccessary, and a quick unit test

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Feb 9, 2024
@L-cloud L-cloud force-pushed the feature/json-parser-fix branch from 9696c9c to 4583715 Compare February 9, 2024 13:51
@L-cloud
Copy link
Contributor Author

L-cloud commented Feb 9, 2024

@hwchase17
Thank you for your comment! I will now provide a detailed example of the specific changes required, drawn from an actual case I have encountered. This involves the use of the same Base Class in two scenarios: one utilizing an OpenAI function call and the other employing a JsonOutputParser.

class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

joke_query = "Tell me a joke."

parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}, #cause
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

openai_functions = [convert_to_openai_function(Joke)]

parser = JsonOutputFunctionsParser()

chain2 = prompt | model.bind(functions=openai_functions) | parser

chain2.invoke({"query": "tell me a joke"})

In this code, chain2 triggers an error stating: openai.BadRequestError: Error code: 400 - {'error': {'message': "'' does not match '^[a-zA-Z0-9_-]{1,64}$' - 'functions.0.name'", 'type': 'invalid_request_error', 'param': None, 'code': None}}. The root cause of this error is the direct deletion of variables from the BaseModel schema via the schema() function in the get_format_instructions function, as shown below:

schema = self.pydantic_object.schema()
reduced_schema = schema
            if "title" in reduced_schema:
                del reduced_schema["title"]
            if "type" in reduced_schema:
                del reduced_schema["type"]

The use of BaseModel's .schema() function in convert_to_openai_function results in default values of "" being assigned to the name and type, leading to the aforementioned error. Although this is the only case I have identified, I believe other functions utilizing BaseModel could similarly be at risk. While there may be a slight degradation in performance, utilizing schema_str appears to present no significant issues.

            schema_str = json.dumps(reduced_schema)
            return JSON_FORMAT_INSTRUCTIONS.format(schema=schema_str)

Please let me know if there is anything that needs clarification, if there's anything I've missed, or if the test code is inadequate or lacking :)

@L-cloud L-cloud requested a review from hwchase17 February 9, 2024 14:02
@L-cloud L-cloud force-pushed the feature/json-parser-fix branch from c11fd8f to 400f10a Compare February 10, 2024 05:28
@dosubot dosubot bot removed the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 10, 2024
@efriis efriis self-assigned this Feb 10, 2024
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 10, 2024
@L-cloud L-cloud force-pushed the feature/json-parser-fix branch from 400f10a to 9d3a9ec Compare February 10, 2024 05:33
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 10, 2024
@L-cloud L-cloud force-pushed the feature/json-parser-fix branch from 9d3a9ec to 8a2c1db Compare February 10, 2024 05:37
@L-cloud L-cloud force-pushed the feature/json-parser-fix branch from ec1f05e to f18c31b Compare February 12, 2024 01:09
@@ -221,7 +221,7 @@ def get_format_instructions(self) -> str:
if self.pydantic_object is None:
return "Return a JSON object."
else:
schema = self.pydantic_object.schema()
schema = {k: v for k, v in self.pydantic_object.schema().items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a comment here would be great! otherwise looks good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hwchase17
I've applied the changes, thank you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also made the same changes in /langchain/langchain/output_parsers/yaml.py and /langchain/langchain/output_parsers/pydantic.py.

@L-cloud L-cloud requested a review from hwchase17 February 13, 2024 01:56
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 13, 2024
@hwchase17 hwchase17 merged commit 8d6cc90 into langchain-ai:master Feb 13, 2024
76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PR looks good. Use to confirm that a PR is ready for merging. Ɑ: parsing Related to output parser module partner size:S This PR changes 10-29 lines, ignoring generated files. template
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants