Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option of streaming gen_ai.choice events. #1964

Open
Cirilla-zmh opened this issue Mar 6, 2025 · 2 comments
Open

Add the option of streaming gen_ai.choice events. #1964

Cirilla-zmh opened this issue Mar 6, 2025 · 2 comments

Comments

@Cirilla-zmh
Copy link
Member

Area(s)

area:gen-ai

What's missing?

There are streaming and non-streaming response mode for LLM call, and that's means the implement of capturing gen_ai.choice can be very different.

I have noticed two approaches up to now:

  1. wait until all contents return: open-ai instrumentation
  2. capture as chunked events: googel-genai instrumentation

Personally we prefer the latter option — which means less memory usage in collection sides. As we know, collection sides normally work together with production applications (they mostly in the same process). So we will still hear sounds that collect tools are occupying much memory if we follow the first implement.

However, there're no semantic conventions about capturing streaming response here. This means that observability backends following the OTel semconv can only recognize choice events that have been aggregated on the collection side. Even if they become aware of the issue above and allow the ingestion of chunked choice events, such an implementation would be non-standardized, leading to a wide variety of final formats and causing confusion for OTel users.

My proposal is: Could we provide an alternative and define a streaming format for the event structure? This would give developers flexibility — they could aggregate the data on the client side, or they could choose to stream the events, with the latter implying that they must rely on a server-side solution that supports aggregation.

Describe the solution you'd like

P.S. I have to point out this topic is what I want to discuss in today's SIG APAC but nobody else comes actually. We really need a notify if this meet has been cancelled or delay.

@aabmass
Copy link
Member

aabmass commented Mar 6, 2025

Any proposal for what to actually change with the current gen_ai.choice event?

@Cirilla-zmh
Copy link
Member Author

Cirilla-zmh commented Mar 7, 2025

Any proposal for what to actually change with the current gen_ai.choice event?

I mean for the streaming or chunked response, we should give an optional semconv like:

  1. Joining all chunked responses into a complete completion, like:
{"index":0,"finish_reason":"stop","message":{"content":"Why did the developer bring OpenTelemetry to the party? Because it always knows how to trace the fun!"}}
  1. Sending chunked responses separately, like:
{"index":0,"sequence_id":0,"message":{"content":"Why did the developer"}}
{"index":0,"sequence_id":1,"message":{"content":" bring OpenTelemetry"}}
{"index":0,"sequence_id":2,"message":{"content":" to the party?"}}
{"index":0,"sequence_id":3,"message":{"content":" Because it always"}}
{"index":0,"sequence_id":4,"message":{"content":" knows how to"}}
{"index":0,"sequence_id":5,"finish_reason":"stop","message":{"content":" trace the fun!"}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants