🐛 ToolCall non returned when using client.exec_chat_stream #46

jBernavaPrah · 2025-03-05T14:04:30Z

Bug description

Using the example c08-tooluse.rs and changing the code to use the client.exec_chat_stream instead of client.exec_chat, does not return the actual function called, but:

Ok(Start)
Err(JsonValueExt(PropertyNotFound("/delta/reasoning_content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Err(JsonValueExt(PropertyNotFound("/delta/content")))
Ok(End(StreamEnd { captured_usage: None, captured_content: None, captured_reasoning_content: None }))

Adapter

No response

Model

No response

Suggested Resolution

Proposing:

Add InterStreamEvent::Tool(ToolCall) enum variant and consequently everywhere is used.
Add the logic to the streamer.rs to map the tools internally and output them only when the complete tool is created, using the variant InterStreamEvent::Tool

The text was updated successfully, but these errors were encountered:

jeremychone · 2025-03-05T18:03:37Z

@jBernavaPrah Yes, that's correct. I didn't implement the stream for function calling and structured output.

My reasoning is that streaming is a gimmick to appease users due to the general slowness of LLMs, and in fact, it's quite a wasteful protocol.

I implemented it for the text response because this has become the default for many end-user applications.

That being said, I can see a point in implementing it for function calling and structured output.

And your plan seems decent. Hopefully, this shouldn't break any APIs.

Let me know if this is a high priority.

If the PR is relatively simple to implement and not too intrusive, then I'm okay with merging it.

jBernavaPrah · 2025-03-06T07:53:21Z

Hi @jeremychone! Thank you for your reply.

I understand your point, and mine is one of those "end-user applications" where every ms matters :)

To give you context:
I use the stream to collect a chunk of the responses and then use them in different ways. Without the stream, I need to wait from 4s to 10s (until the whole response is returned, and this depends on the provider used) instead with the stream I can process them in chunks, and the TTFT is almost near 300ms/600ms (from my Mac at home, and hopefully less when on a cloud), and the first chunk (that make sense to be processed) is processed in less than 800ms/1s. This is a huge win over waiting for the whole response. Also some LLM, like OpenAI, sometimes return tokens and tooling in the same stream.

If I may add my point of view, the stream should be the first-party (if not the only one) implementation. In this way how does need the stream, have everything there, and how doesn't need it, could use some utility functions (like response.tools().await or response.content().await) on the returned struct from client.exec_stream to get the whole response and tools without needing to deal with the stream itself.

Anyway, I'm already working on a PR for returning the tool from the stream (for OpenAI for now). I'm hoping to push it soon.

jBernavaPrah · 2025-03-06T13:01:19Z

Hi @jeremychone I pushed to my branch the changes to at least make it work with OpenAI.

Unfortunately, I needed to change some public structs, create new enums etc..

Your revision and opinion are appreciated before I continue with also the other providers.

Thanks! Jure

jeremychone · 2025-03-06T16:08:34Z

@jBernavaPrah
Thank you for sharing your use case for streams and your point of view.

I definitely agree that GenAI should support streaming for tooling.

Now, regarding whether streaming should be first-party, my opinion may differ. Here are three points:

Use cases - For many use cases, including the ones I am working with, the complete response needs to be actionable. As long as there's no need to appease the user's wait, there is no value in streaming in this particular use case.
Resources - In fact, when the complete result is needed for any future steps, streaming is extremely wasteful, probably by a factor of 3x to 5x, if not more.
- 2.a) For 2s+ response time, this might not really matter, but with very fast models such as Gemini Flash and some of the Groq models, it starts to be unnecessary.
- 2.b) And it's not only the wire side but also the processing side. In a cloud environment, all those chunks will add to server work that would otherwise be completely unnecessary (NIO is perfect for long chat latency responses).
Provider endpoints - Also, all providers have two endpoints, and artificially merging it in the genai would be too presumptuous and intrusive to the developer.

So, in short, here is the way I would like to add streaming to genai:

It should be in its own API and implementation (as it is today).
Matching its respective provider endpoint (as it is today).
Have no effect on the chat(...) API because of the ChatResponse augmentation.
Not be involved at all when calling the chat(...).

There may be different valid approaches, and yours might be a reasonable one. However, for genai, I would like to take the one above for the reasons explained.

Side note

Btw, on an unrelated note, I might change the prompt MessageContent to support caching and also add this support to System. This might be an API change, and I'm not sure if it will be a breaking one. But in any case, this might be a bump to 0.2.0.

I'm also not sure if this tool's streaming support will introduce breaking changes, but the addition should be quite limited, so it could make it to 0.1.x.

jBernavaPrah added the bug Something isn't working label Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 ToolCall non returned when using client.exec_chat_stream #46

🐛 ToolCall non returned when using client.exec_chat_stream #46

jBernavaPrah commented Mar 5, 2025

jeremychone commented Mar 5, 2025

jBernavaPrah commented Mar 6, 2025

jBernavaPrah commented Mar 6, 2025

jeremychone commented Mar 6, 2025

🐛 ToolCall non returned when using client.exec_chat_stream #46

🐛 ToolCall non returned when using client.exec_chat_stream #46

Comments

jBernavaPrah commented Mar 5, 2025

Bug description

Adapter

Model

Suggested Resolution

jeremychone commented Mar 5, 2025

jBernavaPrah commented Mar 6, 2025

jBernavaPrah commented Mar 6, 2025

jeremychone commented Mar 6, 2025