Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

intellectronica · 2024-11-21T14:56:59Z

See https://github.com/intellectronica/pydantic-ai-experiments/blob/main/scratch.ipynb

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/_result.py", line 189, in validate
    result = self.type_adapter.validate_json(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic/type_adapter.py", line 425, in validate_json
    return self.validator.validate_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 2 validation errors for Question
reflection
  Field required [type=missing, input_value={'_': {'reflection': "The...n': 'Is it an animal?'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing
question
  Field required [type=missing, input_value={'_': {'reflection': "The...n': 'Is it an animal?'}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 654, in _handle_model_response
    result_data = result_tool.validate(call)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/_result.py", line 203, in validate
    raise ToolRetryError(m) from e
pydantic_ai._result.ToolRetryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 181, in run
    either = await self._handle_model_response(model_response, deps)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 657, in _handle_model_response
    self._incr_result_retry()
  File "/usr/local/python/3.12.1/lib/python3.12/site-packages/pydantic_ai/agent.py", line 751, in _incr_result_retry
    raise exceptions.UnexpectedModelBehavior(
pydantic_ai.exceptions.UnexpectedModelBehavior: Exceeded maximum retries (1) for result validation

See the prefixed "_" :? It's not there on earlier calls. Possibly a hallucination.

I think this can be avoided with strict mode. Would be great to have it as an option for OpenAI calls.

The text was updated successfully, but these errors were encountered:

intellectronica · 2024-11-21T14:57:25Z

@samuelcolvin ^^^^^

samuelcolvin · 2024-11-21T18:28:09Z

weird, not sure what's going on, I ran your code, and it worked first time.

I ran it directly as a script, and it worked fine:

from enum import Enum
from textwrap import dedent
from typing import List

from pydantic import BaseModel, Field
from pydantic_ai import Agent, CallContext


class Question(BaseModel):
    reflection: str = Field(..., description='Considering the questions and answers so far, what are things we can ask next?')
    question: str = Field(..., description='The question to ask the other player')


asking_agent = Agent('openai:gpt-4o', result_type=Question)


@asking_agent.system_prompt
async def asking_agent_system_prompt(ctx: CallContext[List]) -> str:
    turns = ctx.deps
    prompt = dedent(f"""
        You are playing a game of 20 questions.
        You are trying to guess the object the other player is thinking of.
        In each turn, you can ask a yes or no question.
        The other player will answer with "yes", "no".
    """).strip()
    if len(turns) > 0:
        prompt += f"\nHere are the questions you have asked so far and the answers you have received:\n"
        prompt += '\n'.join([' * ' + turn for turn in turns])
    return prompt


class Answer(str, Enum):
    YES = 'yes'
    NO = 'no'
    YOU_WIN = 'you win'


class AnswerResponse(BaseModel):
    reflection: str = Field(..., description=(
        'Considering the question, what is the answer? '
        'Is it "yes" or "no"? Or did they guess the '
        'object and the answer is "you win"?'))
    answer: Answer = Field(..., description='The answer to the question - "yes", "no", or "you win"')


ansering_agent = Agent('openai:gpt-4o', result_type=AnswerResponse)


@ansering_agent.system_prompt
async def answering_agent_system_prompt(ctx: CallContext[str]) -> str:
    prompt = dedent(f"""
        You are playing a game of 20 questions.
        The other player is trying to guess the object you are thinking of.
        The object you are thinking of is: {ctx.deps}.
        Answer with "yes" or "no", or "you win" if the other player has guessed the object.
    """).strip()
    return prompt


def twenty_questions(mytery_object):
    turns = []
    while True:
        question = asking_agent.run_sync('Ask the next question', deps=turns).data.question
        answer = ansering_agent.run_sync(question, deps=mytery_object).data.answer.value
        if answer == Answer.YOU_WIN:
            print('You Win!')
            break
        elif len(turns) >= 20:
            print('You Lose!')
            break
        else:
            turns.append(f'{question} - {answer}')
            print(f'{len(turns)}. QUESTION: {question}\nANSWER: {answer}\n')

twenty_questions('a cat')

output:

1. QUESTION: Is it something commonly found indoors?
ANSWER: yes

2. QUESTION: Does it use electricity?
ANSWER: no

3. QUESTION: Is it used for storage?
ANSWER: no

4. QUESTION: Is it used for entertainment purposes?
ANSWER: no

5. QUESTION: Is it used for cleaning?
ANSWER: no

6. QUESTION: Is it a piece of furniture?
ANSWER: no

7. QUESTION: Is it used for writing or drawing?
ANSWER: no

8. QUESTION: Is it used for personal grooming or hygiene?
ANSWER: no

9. QUESTION: Is it used in the kitchen?
ANSWER: no

10. QUESTION: Is it related to health or safety?
ANSWER: no

11. QUESTION: Is it used for decoration?
ANSWER: no

12. QUESTION: Is it used for organizing?
ANSWER: no

13. QUESTION: Is it used for communication?
ANSWER: no

14. QUESTION: Is it used for comfort or relaxation?
ANSWER: yes

15. QUESTION: Is it something you can wear indoors?
ANSWER: no

16. QUESTION: Is it something you can sit or lie on?
ANSWER: no

17. QUESTION: Is it something you can hold or carry? 
ANSWER: yes

18. QUESTION: Is it a textile item like a pillow or a blanket?
ANSWER: no

19. QUESTION: Is it used to provide warmth?
ANSWER: no

20. QUESTION: Is it something you use to hold or support things?
ANSWER: no

You Lose!

intellectronica · 2024-11-21T18:48:13Z

Yes, it also works most of the time for me. Just not all the time. The LLMs are non-deterministic and sometimes they do weird things. My point is that it's good to be defensive, and strict mode is one way to do this.

…

On Thu, 21 Nov 2024 at 19:28, Samuel Colvin ***@***.***> wrote: weird, not sure what's going on, I ran your code, and it worked first time. I ran it directly as a script, and it worked fine: from enum import Enumfrom textwrap import dedentfrom typing import List from pydantic import BaseModel, Fieldfrom pydantic_ai import Agent, CallContext class Question(BaseModel): reflection: str = Field(..., description='Considering the questions and answers so far, what are things we can ask next?') question: str = Field(..., description='The question to ask the other player') asking_agent = Agent('openai:gpt-4o', result_type=Question) @asking_agent.system_promptasync def asking_agent_system_prompt(ctx: CallContext[List]) -> str: turns = ctx.deps prompt = dedent(f""" You are playing a game of 20 questions. You are trying to guess the object the other player is thinking of. In each turn, you can ask a yes or no question. The other player will answer with "yes", "no". """).strip() if len(turns) > 0: prompt += f"\nHere are the questions you have asked so far and the answers you have received:\n" prompt += '\n'.join([' * ' + turn for turn in turns]) return prompt class Answer(str, Enum): YES = 'yes' NO = 'no' YOU_WIN = 'you win' class AnswerResponse(BaseModel): reflection: str = Field(..., description=( 'Considering the question, what is the answer? ' 'Is it "yes" or "no"? Or did they guess the ' 'object and the answer is "you win"?')) answer: Answer = Field(..., description='The answer to the question - "yes", "no", or "you win"') ansering_agent = Agent('openai:gpt-4o', result_type=AnswerResponse) @ansering_agent.system_promptasync def answering_agent_system_prompt(ctx: CallContext[str]) -> str: prompt = dedent(f""" You are playing a game of 20 questions. The other player is trying to guess the object you are thinking of. The object you are thinking of is: {ctx.deps}. Answer with "yes" or "no", or "you win" if the other player has guessed the object. """).strip() return prompt def twenty_questions(mytery_object): turns = [] while True: question = asking_agent.run_sync('Ask the next question', deps=turns).data.question answer = ansering_agent.run_sync(question, deps=mytery_object).data.answer.value if answer == Answer.YOU_WIN: print('You Win!') break elif len(turns) >= 20: print('You Lose!') break else: turns.append(f'{question} - {answer}') print(f'{len(turns)}. QUESTION: {question}\nANSWER: {answer}\n') twenty_questions('a cat') output: 1. QUESTION: Is it something commonly found indoors? ANSWER: yes 2. QUESTION: Does it use electricity? ANSWER: no 3. QUESTION: Is it used for storage? ANSWER: no 4. QUESTION: Is it used for entertainment purposes? ANSWER: no 5. QUESTION: Is it used for cleaning? ANSWER: no 6. QUESTION: Is it a piece of furniture? ANSWER: no 7. QUESTION: Is it used for writing or drawing? ANSWER: no 8. QUESTION: Is it used for personal grooming or hygiene? ANSWER: no 9. QUESTION: Is it used in the kitchen? ANSWER: no 10. QUESTION: Is it related to health or safety? ANSWER: no 11. QUESTION: Is it used for decoration? ANSWER: no 12. QUESTION: Is it used for organizing? ANSWER: no 13. QUESTION: Is it used for communication? ANSWER: no 14. QUESTION: Is it used for comfort or relaxation? ANSWER: yes 15. QUESTION: Is it something you can wear indoors? ANSWER: no 16. QUESTION: Is it something you can sit or lie on? ANSWER: no 17. QUESTION: Is it something you can hold or carry? ANSWER: yes 18. QUESTION: Is it a textile item like a pillow or a blanket? ANSWER: no 19. QUESTION: Is it used to provide warmth? ANSWER: no 20. QUESTION: Is it something you use to hold or support things? ANSWER: no You Lose! — Reply to this email directly, view it on GitHub <#81 (comment)> or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAALJOCW454F26AQQMIUAFT2BYQ47BFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE4DCOBTGMYTCOJYQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRSGY3TSOBRGM2TGM5HORZGSZ3HMVZKMY3SMVQXIZI> . You are receiving this email because you authored the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .

samuelcolvin · 2024-11-21T19:24:57Z

Thanks, yup I'll look into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

intellectronica commented Nov 21, 2024

intellectronica commented Nov 21, 2024

samuelcolvin commented Nov 21, 2024

intellectronica commented Nov 21, 2024 via email

samuelcolvin commented Nov 21, 2024

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

Weird tool call arguments, resulting in UnexpectedModelBehaviour / validation error #81

Comments

intellectronica commented Nov 21, 2024

intellectronica commented Nov 21, 2024

samuelcolvin commented Nov 21, 2024

intellectronica commented Nov 21, 2024 via email

samuelcolvin commented Nov 21, 2024