Enhancement Proposal: PydanticStreamOutputParser for Stream and Improved LLM JSON Output #19225

YanSte · 2024-03-18T10:41:32Z

YanSte
Mar 18, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

I propose enhancing Pydantic Parser to support stream, which is currently not implemented. I have developed a solution to handle stream with Pydantic models. (See the code)

Additionally, I suggest revising the JSON schema format used.
For exemple with 7b models (Mistral) my adjustments resulting in significantly improved compatibility.

Prompt Example:

The output should be formatted as a JSON instance that conforms to the JSON schema below:
"""
{'n properties': '(n type) n Description'}
"""

Prompt with value Example:

class MyModel(BaseModel):
    my_var: str = Field(description="Name of the something...")

The output should be formatted as a JSON instance that conforms to the JSON schema below:
"""
{'my_var': '(string) Name of the something...'}
"""

My question is do I make a pull request? What do you think?

Motivation

My proposal aims to address two key issues.

Firstly, enabling Pydantic support for stream.

Secondly, refining the JSON output format generated by Language Model. Giving better results for 7b models with this format.

Proposal (If applicable)


class PydanticStreamOutputParser(BaseCumulativeTransformOutputParser[TBaseModel]):
    """Pydantic output parser that can handle streaming input."""

    pydantic_object: Type[TBaseModel]

    def _diff(self, prev: Optional[Any], next: Any) -> Any:
        return jsonpatch.make_patch(prev, next).patch

    def parse_result(self, result: List[Generation], *, partial: bool = False) -> Any:
        text = result[0].text
        text = text.strip()
        try:
            json_object = parse_json_markdown(text)
            result = self.pydantic_object.parse_obj(json_object)
            return result
        except JSONDecodeError:
            return None
        except ValidationError:
            return None

    def parse(self, text: str) -> TBaseModel:
        return self.parse_result([Generation(text=text)])

    @property
    def _type(self) -> str:
        return "pydantic_stream_output_parser"

    @property
    def OutputType(self) -> Type[TBaseModel]:
        """Return the pydantic model."""
        return self.pydantic_object

    def get_format_instructions(self) -> str:
        schema_dict = self.pydantic_object.schema()

        final_output = {}
        for key, value in schema_dict["properties"].items():
            final_output[key] = f"({value['type']}) {value['description']}"

        return _PYDANTIC_STREAM_FORMAT_INSTRUCTIONS.format(schema=final_output)


_PYDANTIC_STREAM_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below:

{schema}

"""  # noqa: E501

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement Proposal: PydanticStreamOutputParser for Stream and Improved LLM JSON Output #19225

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Enhancement Proposal: PydanticStreamOutputParser for Stream and Improved LLM JSON Output #19225

YanSte Mar 18, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 0 comments

YanSte
Mar 18, 2024