You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it
Feature request
I propose enhancing Pydantic Parser to support stream, which is currently not implemented. I have developed a solution to handle stream with Pydantic models. (See the code)
Additionally, I suggest revising the JSON schema format used.
For exemple with 7b models (Mistral) my adjustments resulting in significantly improved compatibility.
Prompt Example:
The output should be formatted as a JSON instance that conforms to the JSON schema below:
"""
{'n properties': '(n type) n Description'}
"""
Prompt with value Example:
classMyModel(BaseModel):
my_var: str=Field(description="Name of the something...")
The output should be formatted as a JSON instance that conforms to the JSON schema below:
"""
{'my_var': '(string) Name of the something...'}
"""
My question is do I make a pull request? What do you think?
Motivation
My proposal aims to address two key issues.
Firstly, enabling Pydantic support for stream.
Secondly, refining the JSON output format generated by Language Model. Giving better results for 7b models with this format.
Proposal (If applicable)
class PydanticStreamOutputParser(BaseCumulativeTransformOutputParser[TBaseModel]):
"""Pydantic output parser that can handle streaming input."""
pydantic_object: Type[TBaseModel]
def _diff(self, prev: Optional[Any], next: Any) -> Any:
return jsonpatch.make_patch(prev, next).patch
def parse_result(self, result: List[Generation], *, partial: bool = False) -> Any:
text = result[0].text
text = text.strip()
try:
json_object = parse_json_markdown(text)
result = self.pydantic_object.parse_obj(json_object)
return result
except JSONDecodeError:
return None
except ValidationError:
return None
def parse(self, text: str) -> TBaseModel:
return self.parse_result([Generation(text=text)])
@property
def _type(self) -> str:
return "pydantic_stream_output_parser"
@property
def OutputType(self) -> Type[TBaseModel]:
"""Return the pydantic model."""
return self.pydantic_object
def get_format_instructions(self) -> str:
schema_dict = self.pydantic_object.schema()
final_output = {}
for key, value in schema_dict["properties"].items():
final_output[key] = f"({value['type']}) {value['description']}"
return _PYDANTIC_STREAM_FORMAT_INSTRUCTIONS.format(schema=final_output)
_PYDANTIC_STREAM_FORMAT_INSTRUCTIONS = """The output should be formatted as a JSON instance that conforms to the JSON schema below:
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked
Feature request
I propose enhancing Pydantic Parser to support stream, which is currently not implemented. I have developed a solution to handle stream with Pydantic models. (See the code)
Additionally, I suggest revising the JSON schema format used.
For exemple with 7b models (Mistral) my adjustments resulting in significantly improved compatibility.
Prompt Example:
Prompt with value Example:
My question is do I make a pull request? What do you think?
Motivation
My proposal aims to address two key issues.
Firstly, enabling Pydantic support for stream.
Secondly, refining the JSON output format generated by Language Model. Giving better results for 7b models with this format.
Proposal (If applicable)
{schema}
Beta Was this translation helpful? Give feedback.
All reactions