Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 ToolCall non returned when using client.exec_chat_stream #46

Open
jBernavaPrah opened this issue Mar 5, 2025 · 4 comments
Open
Labels
bug Something isn't working

Comments

@jBernavaPrah
Copy link
Contributor

Bug description

Using the example c08-tooluse.rs and changing the code to use the client.exec_chat_stream instead of client.exec_chat, does not return the actual function called, but:

Ok(Start)
Err(JsonValueExt(PropertyNotFound("/delta/reasoning_content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Ok(End(StreamEnd { captured_usage: None, captured_content: None, captured_reasoning_content: None }))

Adapter

No response

Model

No response

Suggested Resolution

Proposing:

  • Add InterStreamEvent::Tool(ToolCall) enum variant and consequently everywhere is used.
  • Add the logic to the streamer.rs to map the tools internally and output them only when the complete tool is created, using the variant InterStreamEvent::Tool
@jBernavaPrah jBernavaPrah added the bug Something isn't working label Mar 5, 2025
@jeremychone
Copy link
Owner

@jBernavaPrah Yes, that's correct. I didn't implement the stream for function calling and structured output.

My reasoning is that streaming is a gimmick to appease users due to the general slowness of LLMs, and in fact, it's quite a wasteful protocol.

I implemented it for the text response because this has become the default for many end-user applications.

That being said, I can see a point in implementing it for function calling and structured output.

And your plan seems decent. Hopefully, this shouldn't break any APIs.

Let me know if this is a high priority.

If the PR is relatively simple to implement and not too intrusive, then I'm okay with merging it.

@jBernavaPrah
Copy link
Contributor Author

Hi @jeremychone! Thank you for your reply.

I understand your point, and mine is one of those "end-user applications" where every ms matters :)

To give you context:
I use the stream to collect a chunk of the responses and then use them in different ways. Without the stream, I need to wait from 4s to 10s (until the whole response is returned, and this depends on the provider used) instead with the stream I can process them in chunks, and the TTFT is almost near 300ms/600ms (from my Mac at home, and hopefully less when on a cloud), and the first chunk (that make sense to be processed) is processed in less than 800ms/1s. This is a huge win over waiting for the whole response. Also some LLM, like OpenAI, sometimes return tokens and tooling in the same stream.

If I may add my point of view, the stream should be the first-party (if not the only one) implementation. In this way how does need the stream, have everything there, and how doesn't need it, could use some utility functions (like response.tools().await or response.content().await) on the returned struct from client.exec_stream to get the whole response and tools without needing to deal with the stream itself.

Anyway, I'm already working on a PR for returning the tool from the stream (for OpenAI for now). I'm hoping to push it soon.

@jBernavaPrah
Copy link
Contributor Author

Hi @jeremychone I pushed to my branch the changes to at least make it work with OpenAI.

Unfortunately, I needed to change some public structs, create new enums etc..

Your revision and opinion are appreciated before I continue with also the other providers.

Thanks! Jure

@jeremychone
Copy link
Owner

@jBernavaPrah
Thank you for sharing your use case for streams and your point of view.

I definitely agree that GenAI should support streaming for tooling.

Now, regarding whether streaming should be first-party, my opinion may differ. Here are three points:

  1. Use cases - For many use cases, including the ones I am working with, the complete response needs to be actionable. As long as there's no need to appease the user's wait, there is no value in streaming in this particular use case.

  2. Resources - In fact, when the complete result is needed for any future steps, streaming is extremely wasteful, probably by a factor of 3x to 5x, if not more.

    • 2.a) For 2s+ response time, this might not really matter, but with very fast models such as Gemini Flash and some of the Groq models, it starts to be unnecessary.

    • 2.b) And it's not only the wire side but also the processing side. In a cloud environment, all those chunks will add to server work that would otherwise be completely unnecessary (NIO is perfect for long chat latency responses).

  3. Provider endpoints - Also, all providers have two endpoints, and artificially merging it in the genai would be too presumptuous and intrusive to the developer.

So, in short, here is the way I would like to add streaming to genai:

  1. It should be in its own API and implementation (as it is today).
  2. Matching its respective provider endpoint (as it is today).
  3. Have no effect on the chat(...) API because of the ChatResponse augmentation.
  4. Not be involved at all when calling the chat(...).

There may be different valid approaches, and yours might be a reasonable one. However, for genai, I would like to take the one above for the reasons explained.

Side note

Btw, on an unrelated note, I might change the prompt MessageContent to support caching and also add this support to System. This might be an API change, and I'm not sure if it will be a breaking one. But in any case, this might be a bump to 0.2.0.

I'm also not sure if this tool's streaming support will introduce breaking changes, but the addition should be quite limited, so it could make it to 0.1.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants