You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Personally we prefer the latter option — which means less memory usage in collection sides. As we know, collection sides normally work together with production applications (they mostly in the same process). So we will still hear sounds that collect tools are occupying much memory if we follow the first implement.
However, there're no semantic conventions about capturing streaming response here. This means that observability backends following the OTel semconv can only recognize choice events that have been aggregated on the collection side. Even if they become aware of the issue above and allow the ingestion of chunked choice events, such an implementation would be non-standardized, leading to a wide variety of final formats and causing confusion for OTel users.
My proposal is: Could we provide an alternative and define a streaming format for the event structure? This would give developers flexibility — they could aggregate the data on the client side, or they could choose to stream the events, with the latter implying that they must rely on a server-side solution that supports aggregation.
Describe the solution you'd like
P.S. I have to point out this topic is what I want to discuss in today's SIG APAC but nobody else comes actually. We really need a notify if this meet has been cancelled or delay.
The text was updated successfully, but these errors were encountered:
Any proposal for what to actually change with the current gen_ai.choice event?
I mean for the streaming or chunked response, we should give an optional semconv like:
Joining all chunked responses into a complete completion, like:
{"index":0,"finish_reason":"stop","message":{"content":"Why did the developer bring OpenTelemetry to the party? Because it always knows how to trace the fun!"}}
Sending chunked responses separately, like:
{"index":0,"sequence_id":0,"message":{"content":"Why did the developer"}}
{"index":0,"sequence_id":1,"message":{"content":" bring OpenTelemetry"}}
{"index":0,"sequence_id":2,"message":{"content":" to the party?"}}
{"index":0,"sequence_id":3,"message":{"content":" Because it always"}}
{"index":0,"sequence_id":4,"message":{"content":" knows how to"}}
{"index":0,"sequence_id":5,"finish_reason":"stop","message":{"content":" trace the fun!"}}
Area(s)
area:gen-ai
What's missing?
There are streaming and non-streaming response mode for LLM call, and that's means the implement of capturing
gen_ai.choice
can be very different.I have noticed two approaches up to now:
Personally we prefer the latter option — which means less memory usage in collection sides. As we know, collection sides normally work together with production applications (they mostly in the same process). So we will still hear sounds that collect tools are occupying much memory if we follow the first implement.
However, there're no semantic conventions about capturing streaming response here. This means that observability backends following the OTel semconv can only recognize choice events that have been aggregated on the collection side. Even if they become aware of the issue above and allow the ingestion of chunked choice events, such an implementation would be non-standardized, leading to a wide variety of final formats and causing confusion for OTel users.
My proposal is: Could we provide an alternative and define a streaming format for the event structure? This would give developers flexibility — they could aggregate the data on the client side, or they could choose to stream the events, with the latter implying that they must rely on a server-side solution that supports aggregation.
Describe the solution you'd like
P.S. I have to point out this topic is what I want to discuss in today's SIG APAC but nobody else comes actually. We really need a notify if this meet has been cancelled or delay.
The text was updated successfully, but these errors were encountered: