-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[processor/tailsampling] decision_wait time and the lifespan of a trace #36291
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hi @AliArfan - Looking at the docs, there is a |
Hi @bacherfl Thank you for your quick response! I just took a look at the docs, and this is what I found about the
Per my understanding, this is to save trace decision after it has been released from the memory. My problem is that for some traces the decision is made too early for the application's edge cases(before we receive an error). I would not like to increase the If I can use the |
Thank you for clarifying @AliArfan! I see, in that case the |
Thank you @bacherfl! Now I know that increasing |
@bacherfl is completely right here. One of the problems with span-based traces is that there's no "trace" per se: a trace is only a collection of spans who happen to share the same trace ID. Therefore, it's impossible for us to determine when a trace has been completed, especially in async use-cases. At the moment, what I can recommend is indeed increasing the decision wait property. |
Component(s)
processor/tailsampling
Describe the issue you're reporting
Hi,
I am fairly new to the tail sampling processor, but I would like to ask if there is a solution to my use case. After reading the documentation and looking at examples online, my only viable option seems to be increasing the
decision_wait
time.Problem Statement
We have a gRPC collector that processes each message from Cisco devices. We have leveraged OpenTelemetry to gain insights into the application's health. However, we noticed that during a month, we produce 20GB of data. Therefore, we would like to use tail sampling to minimize the sampled data and only sample on
probabilistic
andstatus_code: ERROR
.The problem arises when the
decision_wait
time is reached due to errors in our application where we have a retry and backoff mechanism. For example, we try to re-publish the message to RabbitMQ if it fails, with an increasing backoff interval. Thedecision_wait
set in the tail sampling processor would be too short to include all the retry spans.Is there a way to sample all the error spans on retry, even after the
decision_wait
time has been reached?It would be nice if there were a
trace_start
andtrace_end
we could use to only process traces that are complete.Thank you!
The text was updated successfully, but these errors were encountered: