langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions #17162

L-cloud · 2024-02-07T09:07:36Z

Description :

Fix: Use shallow copy for schema manipulation in get_format_instructions

Prevents side effects on the original schema object by using a dictionary comprehension for a safer and more controlled manipulation of schema key-value pairs, enhancing code reliability.

Issue: Issue: JsonOutputParser's get_format_instructions() Modifying Pydantic Class Schema #17161
Dependencies: None
Twitter handle: None

vercel · 2024-02-07T09:07:40Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Feb 13, 2024 6:40am

hwchase17

can we add a comment about why this is neccessary, and a quick unit test

L-cloud · 2024-02-09T14:02:13Z

@hwchase17
Thank you for your comment! I will now provide a detailed example of the specific changes required, drawn from an actual case I have encountered. This involves the use of the same Base Class in two scenarios: one utilizing an OpenAI function call and the other employing a JsonOutputParser.

class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

joke_query = "Tell me a joke."

parser = JsonOutputParser(pydantic_object=Joke)

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()}, #cause
)

chain = prompt | model | parser

chain.invoke({"query": joke_query})

openai_functions = [convert_to_openai_function(Joke)]

parser = JsonOutputFunctionsParser()

chain2 = prompt | model.bind(functions=openai_functions) | parser

chain2.invoke({"query": "tell me a joke"})

In this code, chain2 triggers an error stating: openai.BadRequestError: Error code: 400 - {'error': {'message': "'' does not match '^[a-zA-Z0-9_-]{1,64}$' - 'functions.0.name'", 'type': 'invalid_request_error', 'param': None, 'code': None}}. The root cause of this error is the direct deletion of variables from the BaseModel schema via the schema() function in the get_format_instructions function, as shown below:

schema = self.pydantic_object.schema()
reduced_schema = schema
            if "title" in reduced_schema:
                del reduced_schema["title"]
            if "type" in reduced_schema:
                del reduced_schema["type"]

The use of BaseModel's .schema() function in convert_to_openai_function results in default values of "" being assigned to the name and type, leading to the aforementioned error. Although this is the only case I have identified, I believe other functions utilizing BaseModel could similarly be at risk. While there may be a slight degradation in performance, utilizing schema_str appears to present no significant issues.

            schema_str = json.dumps(reduced_schema)
            return JSON_FORMAT_INSTRUCTIONS.format(schema=schema_str)

Please let me know if there is anything that needs clarification, if there's anything I've missed, or if the test code is inadequate or lacking :)

hwchase17 · 2024-02-12T05:21:46Z

libs/core/langchain_core/output_parsers/json.py

@@ -221,7 +221,7 @@ def get_format_instructions(self) -> str:
        if self.pydantic_object is None:
            return "Return a JSON object."
        else:
-            schema = self.pydantic_object.schema()
+            schema = {k: v for k, v in self.pydantic_object.schema().items()}


a comment here would be great! otherwise looks good

@hwchase17
I've applied the changes, thank you :)

I have also made the same changes in /langchain/langchain/output_parsers/yaml.py and /langchain/langchain/output_parsers/pydantic.py.

Fix: Use shallow copy for schema manipulation in get_format_instructions

f22d718

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Feb 7, 2024

dosubot bot added Ɑ: parsing Related to output parser module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 7, 2024

hwchase17 reviewed Feb 8, 2024

View reviewed changes

Add: unit tests for BaseModel schema consistency

849a279

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Feb 9, 2024

L-cloud force-pushed the feature/json-parser-fix branch from 9696c9c to 4583715 Compare February 9, 2024 13:51

L-cloud requested a review from hwchase17 February 9, 2024 14:02

L-cloud force-pushed the feature/json-parser-fix branch from c11fd8f to 400f10a Compare February 10, 2024 05:28

efriis added partner template labels Feb 10, 2024

dosubot bot removed the size:S This PR changes 10-29 lines, ignoring generated files. label Feb 10, 2024

efriis self-assigned this Feb 10, 2024

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 10, 2024

L-cloud force-pushed the feature/json-parser-fix branch from 400f10a to 9d3a9ec Compare February 10, 2024 05:33

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 10, 2024

Merge master and resolve conflicts

8a2c1db

L-cloud force-pushed the feature/json-parser-fix branch from 9d3a9ec to 8a2c1db Compare February 10, 2024 05:37

L-cloud added 2 commits February 10, 2024 14:39

Apply code formatting with 'make format'

ef3f6d3

Refactor test code for dictionary comparison method

f18c31b

L-cloud force-pushed the feature/json-parser-fix branch from ec1f05e to f18c31b Compare February 12, 2024 01:09

hwchase17 reviewed Feb 12, 2024

View reviewed changes

L-cloud added 2 commits February 13, 2024 10:24

Add comment to prevent original Pydantic schema modifications

4d2f179

Apply code formatting with 'make lint'

78f5874

L-cloud requested a review from hwchase17 February 13, 2024 01:56

Fix: Extended shallow copy fix to yaml.py and pydantic.py

3331b1a

hwchase17 approved these changes Feb 13, 2024

View reviewed changes

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Feb 13, 2024

hwchase17 merged commit 8d6cc90 into langchain-ai:master Feb 13, 2024
76 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions #17162

langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions #17162

L-cloud commented Feb 7, 2024 •

edited

Loading

vercel bot commented Feb 7, 2024 •

edited

Loading

hwchase17 left a comment

L-cloud commented Feb 9, 2024 •

edited

Loading

hwchase17 Feb 12, 2024

L-cloud Feb 13, 2024

L-cloud Feb 13, 2024

langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions #17162

langchain.core : Use shallow copy for schema manipulation in JsonOutputParser.get_format_instructions #17162

Conversation

L-cloud commented Feb 7, 2024 • edited Loading

vercel bot commented Feb 7, 2024 • edited Loading

hwchase17 left a comment

Choose a reason for hiding this comment

L-cloud commented Feb 9, 2024 • edited Loading

hwchase17 Feb 12, 2024

Choose a reason for hiding this comment

L-cloud Feb 13, 2024

Choose a reason for hiding this comment

L-cloud Feb 13, 2024

Choose a reason for hiding this comment

L-cloud commented Feb 7, 2024 •

edited

Loading

vercel bot commented Feb 7, 2024 •

edited

Loading

L-cloud commented Feb 9, 2024 •

edited

Loading