Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on the system prompt for custom tool use #36

Open
ricklamers opened this issue Aug 2, 2024 · 17 comments
Open

Clarification on the system prompt for custom tool use #36

ricklamers opened this issue Aug 2, 2024 · 17 comments

Comments

@ricklamers
Copy link

ricklamers commented Aug 2, 2024

Awesome work! Just a quick question about the correct system prompt:

in the docs https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1#user-defined-custom-tool-calling this is used:

If a you choose to call a function ONLY reply in the following format:
<{start_tag}={function_name}>{parameters}{end_tag}
where

start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function argument value as value.
end_tag => `</function>`

Here is an example,
<function=example_function_name>{"example_name": "example_value"}</function>

Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line"
- Always add your sources when using search results to answer the user query

You are a helpful Assistant.

While in the repo this is used:

Think very carefully before calling functions.
If you choose to call a function ONLY reply in the following format with no prefix or suffix:

<function=example_function_name>{{"example_name": "example_value"}}</function>

Reminder:
- If looking for real time information use relevant functions before falling back to brave_search
- Function calls MUST follow the specified format, start with <function= and end with </function>
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line

Furthermore, could you clarify if the "Only call one function at a time" implies parallel tool use is not intended to be used for these instruction tuned models (Llama 3.1 family)?

e.g. "Please get the weather for San Francisco and Tokyo" can't generate:

<|start_header_id|>assistant<|end_header_id|>

<function=get_weather>{"location": "San Francisco"}</function>
<function=get_weather>{"location": "Tokyo"}</function><|eot_id|>

Thanks for clarifying!

Rick Lamers
AI Researcher at Groq

@HamidShojanazeri
Copy link

cc: @ashwinb

@ashwinb
Copy link
Contributor

ashwinb commented Aug 5, 2024

@ricklamers thanks for pointing out the discrepancy. Please use the version as specified in the code / this repo. We will update our documentation to match the version from the code.

Re: parallel tool calling, we are doing a couple quick experiments and will get back to you on that ASAP.

@ricklamers
Copy link
Author

Awesome, thanks!

@ricklamers
Copy link
Author

@ashwinb FYI in HF's chat template yet another prompt is used:
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/blob/8c22764a7e3675c50d4c7c9a4edb474456022b16/tokenizer_config.json#L2053

Is that wrong? Should it follow the one in this repo?

@ashwinb
Copy link
Contributor

ashwinb commented Aug 7, 2024

@ricklamers :( not happy with these inconsistencies. it is hard to say something is wrong given the general stochasticity with tool calling unfortunately.

all I will say is that this is the reason we put llama model template --name <...> as part of the llama CLI. so that's the definitive source our researchers generally recommend. Given rapid iteration times, sometimes these recommendations don't reach across all the folks that need to see it.

@ricklamers
Copy link
Author

No worries, as long as we know the correct system prompt (this repo) we can all adjust to converge to the same correct version. Any updates on parallel calls?

@ricklamers
Copy link
Author

I've put out a note for them https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/90

@Rocketknight1
Copy link

Hey, Matt from Hugging Face here. Just to clarify, the HF template was written following the "JSON based tool calling" template in this doc, and the prompt we used was also copied from the example prompt there.

Based on this discussion, am I correct that the <function> format in this repo is preferred, and we shouldn't use JSON tool calling? If so, I should rewrite the whole template to use that instead, rather than just updating the system prompt.

@Imbernoulli
Copy link

So which template is preferred? The function one or the json one? They are both at https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/

@hardikjshah
Copy link
Contributor

Both the json version and function version work reasonably well. We observed that the json one tends to over steer to using tools even when one is not asked for while with <function we were able to control that a bit more. Json version had higher recall but high false positives while <function had lower recall with higher precision. So tbh its a bit use case specific and I'd suggest you try both and see which works best for you.

@Rocketknight1
Copy link

Unfortunately, we kind of have to pick one for the template! One thing we noticed is that with the current JSON template, 8B makes tool calls correctly, but sometimes fails to use the results correctly in chat - not sure if this is an issue with the system message we used, since it was all copied from the doc.

My suspicion is that an alternate prompt would fix a lot of this, and we'd prefer to have a clear answer on the best way to do things rather than several options!

@hardikjshah
Copy link
Contributor

We updated the default tool format to be json based and recommend following that.

#45
meta-llama/llama-stack#29
meta-llama/llama-models#110

The code also supports the <function> format and can be extended to support other formats in the future if needed.
Working with other teams to reconcile and update the website to reflect these changes.

Use this command to get the latest recommended format,

llama model template --name system-custom-tools-only

image

Some caveats,

  • The model sometimes responds with multiple tool calls albeit inconsistently, so for now we only support one tool call at the system level. We are working on making both the models and system better to support this use case.
  • With the json format, at times the model tends to be overly steered to always make tool calls even when not asked for.

Hope this helps resolve the confusion. Again, thanks for raising these issues, it helps us get better and improve with each version.

@Watebear
Copy link

Hello, I attempted to use the prompt concatenation method mentioned above to test BFCL, but the AST SUMMARY only achieved 50.82%. Below is an example of the input I constructed.
example
Could you provide an input format that can reproduce the evaluation results from the report?
Thanks!

@el-hash-1
Copy link

el-hash-1 commented Aug 23, 2024

@hardikjshah I think the llama model template --name assistant-custom-tool-call would also need to be updated to json format.

@pcuenca
Copy link

pcuenca commented Oct 9, 2024

Hello, this is Pedro from Hugging Face. I've been trying today to verify the tool calling template that is in use for the Llama 3.2 models. My approach was to start with the documentation provided by llama model prompt-format -m Llama3.2-1B-Instruct, but also debug the llama stack run server and examine the actual inputs that go into the model. I have a couple of questions:

  1. The output from a tool calling step is (for example) [get_weather(city="San Francisco", metric="celsius")]<|eot_id|>. This is appended to the previous prompt with an ipython role, but the token <|python_tag|> is also prepended to it, even though it was not generated by the model. Is this intended? To be clear, this goes into the model: <|python_tag|>[get_weather(city="San Francisco", metric="celsius")]<|eot_id|>
  2. The system instructions (for example, "You are a helpful assistant") are present in the first turn of the conversation after the tool definitions, but they are skipped in subsequent turns. I used the mesop-based chat UI with a slightly modified version of chat_with_custom_tools.py.
  3. llama model prompt-format -m Llama3.2-1B-Instruct uses this to finalize the custom tool definition in the system prompt and start the user turn: ]<|eot_id|><|start_header_id|>user<|end_header_id|>. However, I'm seeing a newline in actual server use:
]
<|eot_id|><|start_header_id|>user<|end_header_id|>

or the following, when the additional instructions are honored:

]
You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

Should a newline separator be used after the tool definitions?

  1. Based on existing tool examples, I'm assuming the output must be a JSON string. If this is the case, in my tests the model sees this input: "\"25 C\"" rather than the "25 C" indicated in the prompt-format documentation. Are there any other serialization examples?

The reason I'm asking these questions is I've found the use of tools fragile in the small 3.2 text models. We'd like to reduce the ambiguity as much as possible, and provide a validated chat template to the community so developers can experiment with confidence.

@zoubaobao
Copy link

Hello, I attempted to use the prompt concatenation method mentioned above to test BFCL, but the AST SUMMARY only achieved 50.82%. Below is an example of the input I constructed. example Could you provide an input format that can reproduce the evaluation results from the report? Thanks!

Same. I tried this model on BFCL simple test, and I only got 40% accuracy. Llama-3.1-8B-Instruct tool-use ability is not as good as they show.

@edmcman
Copy link

edmcman commented Nov 13, 2024

@hardikjshah It doesn't seem like this command is valid anymore: llama model template --name system-custom-tools-only

See https://colab.research.google.com/drive/1JCPiY8pvP6ZGG2xGvJzrnx7ntfrPeCHX?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests