Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json tool is inventing additional fields #1125

Open
adityaprakash-work opened this issue Feb 11, 2025 · 3 comments
Open

json tool is inventing additional fields #1125

adityaprakash-work opened this issue Feb 11, 2025 · 3 comments

Comments

@adityaprakash-work
Copy link

adityaprakash-work commented Feb 11, 2025

The bug
What is the scope of guidance.json tool? I have tried it with very simple pydantic models and it works for them, but I get anomalous behavior when I nest models.

Image

  • How do I avoid generating extra fields?
  • Is pydantic Field(..., description=...) accounted for during generation? If it doesn't it would be so cool if it did among other things like knowing the correct name of the schema (following example)

There was one instance where I used the following code -

class ExampleToolArgs(BaseModel):
    model_config = ConfigDict(extra="ignore")
    a: int = Field(..., description="The 1st argument")
    b: int = Field(..., description="The 2nd argument")
    c: str = Field(..., description="The 3rd argument")


class DictStrIntArg(BaseModel):
    a: int = Field(..., description="The 1st argument")
    b: int = Field(..., description="The 2nd argument")


class PascalToolArgs(BaseModel):
    model_config = ConfigDict(extra="ignore")
    a: int = Field(..., description="The 1st argument")
    b: int = Field(..., description="The 2nd argument")
    c: DictStrIntArg = Field(
        ..., description="Dictionary with 2 keys of any name and any integer value"
    )
    d: str = Field(..., description="The 4th argument")


class ToolCall(BaseModel):
    model_config = ConfigDict(extra="ignore")
    tool_name: str = Field(..., description="The name of the tool to call")
    tool_args: ExampleToolArgs | PascalToolArgs = Field(
        ..., description="The arguments to the tool"
    )


class ToolUsage(BaseModel):
    tool_calls: list[ToolCall] = Field(..., description="The calls to the tools")


lmx = lm.copy()
with g.user():
    lmx += "Please generate 3 tool calls to the pascal tool and 2 to example tool. DO NOT generate ANY argument that is not specified in the schema."
with g.assistant():
    lmx += json("calls", schema=ToolUsage, temperature=0.2)

pprint(j.loads(lmx["calls"]))

and got this,

Image

It never seems to understand the dictionary part for Pascaltool and moreover forgets about the parameter 'd'.

To Reproduce
⬆️

System info (please complete the following information):

  • OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): WSL Ubuntu
  • Guidance Version (guidance.__version__): 0.2.0
@Harsha-Nori
Copy link
Member

Great call @adityaprakash-work. We've taken a stance so far of sticking very strictly to the JSON schema spec, which allows for additional fields unless you explicitly specify "additionalProperties": false in the schema defintion.

You can change this (I think, can't test atm) by creating a ConfigDict and setting extra="forbid":

from pydantic import BaseModel, ConfigDict

class MyModel(BaseModel):
    model_config = ConfigDict(extra="forbid")  # Prevent extra fields
    name: str
    age: int

# Generate JSON Schema
json_schema = MyModel.model_json_schema()
import json
print(json.dumps(json_schema, indent=2))

That said... I think this is bad guidance library behavior, because most users don't know that JSON schema by default allows extra fields. This is a good push for us to discuss changing our default behavior.

Tagging @hudson-ai and @Julian for further thoughts :) I think the field descriptions are another good area to discuss...we really should think about passing them by default. I wonder if we can have a kwarg on the json function to talk about opinionated vs. strict adherence to json schema semantics, defaulted to what is most natural for AI users

@Julian
Copy link

Julian commented Feb 13, 2025

(Maybe surprisingly, I'm not a pydantic expert! so I'd have to look closer to have a real opinion on what seems right) -- but the one thing I can say I'm aware of immediately is that I think pydantic is still working out what it wants its own defaults to be for some cases here -- specifically I've seen pydantic/pydantic#10785 cross my eye before. I think they asked me to offer an opinion somewhere on this, but I can't find it at the minute...

@adityaprakash-work
Copy link
Author

model_config = ConfigDict(extra="forbid") works 👍 (minor issue, but extra="forbid" did not prevent extra keys for Mock model).

Image

Also, is the supplied schema necessarily respected?

Image

c wasn't supposed to be empty I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants