-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve #[instrument(ret, err)] handling for tracing-opentelemetry #2316
base: master
Are you sure you want to change the base?
Conversation
b25fb9b
to
6c90c68
Compare
This addresses an issue with the standard behaviour of #[instrument] on OpenTelemetry installations, which prevents them from recording a useful error event in these scenarios (resulting in `error = false` and no other details in most cases). I've opted to keep the default behaviour the same for backwards compatibility reasons, since most of the test cases assume `err: std::fmt::Display` instead of `err: std::err::Error` and anything else would be a breaking change. This makes the fix opt-in for most users.
This ensures that error events generated by this method will include a useful name for easy consumption in common OpenTelemetry tracing tools.
This ensures that events generated by this instrumentation have a useful name for consumption in common OpenTelemetry tracing tools.
tracing-attributes/src/attr.rs
Outdated
@@ -177,9 +177,10 @@ impl Parse for EventArgs { | |||
match ident.to_string().as_str() { | |||
"Debug" => result.mode = FormatMode::Debug, | |||
"Display" => result.mode = FormatMode::Display, | |||
"Raw" => result.mode = FormatMode::Raw, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the name "Raw" for this mode --- it's not super obvious to the user what this actually means. Since the error is being recorded as a dyn Error
trait object, and the other two format modes are named "Debug" and "Display", it might make more sense for this to be "Error" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, after taking a closer look, I see that this also controls how ret
values, which are not errors, may be recorded. Maybe Value
is a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like Value
here, but I also wonder whether we need to maybe subdivide this a bit more. For context, this PR is really introducing two things:
- An ability to record the
err
andret
value attributes "in their raw value form" instead of relying onstd::fmt::*
for coercion, which resolves issues with how OpenTelemetry handlesError
internally. - Adding event messages to represent the
err
andret
values so that they appear nicely in practical production UIs like Honeycomb/Jaeger etc.
The second of those points currently is always using Display
semantics, however there are types where that is not implemented and Debug
would be preferable. In those scenarios we may want to be able to specify Value+Debug
and Value+Display
to control these behaviours, or alternatively treat "log the attribute value in its raw form" as an option separate from the formatting of the messages.
I'll update this to use Value
for now and we can discuss any other changes (either here or in a future PR depending on what you'd prefer).
tracing::event!(target: #target, #level_tokens, error = ?e, "{:?}", e) | ||
)), | ||
FormatMode::Raw => Some(quote!( | ||
tracing::event!(target: #target, #level_tokens, error = e, "{}", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some types, such as anyhow::Error
, don't actually implement Error
directly, but have AsRef<&dyn Error>
implementations. is there anything we can do to make this mode also work nicely for events that have to be as_ref
'd, without breaking returns for values which are Error
implementers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting area to think about. Presumably this falls into the problem domain of mapping attribute values to the record_error(...)
hook within the Layer
implementation?
I'm expecting that in this specific scenario we're not going to see too much by way of business logic around value types, though if I've missed something then please shout. As for implementing the business logic for handling AsRef<&dyn Error>
I'm happy to chase that down, though I would propose landing that in a separate PR to keep the concepts neatly segregated.
Co-authored-by: Eliza Weisman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few more notes, one of which is actually important (the test won't actually pass as it's written currently). other than that, this looks good!
Thanks for the review there @hawkw, I think I've covered the docs issues (and some other stuff I noticed going through there) - shout if you spot anything else. |
looks like |
…y adding new options Co-authored-by: Eliza Weisman <[email protected]>
…ors can be easily rendered
Motivation
This PR is intended to help resolve an issue with the way that the
#[instrument]
proc-macro handles return values when working withtracing-opentelemetry
. In particular, this focuses on resolving the empty events generated by#[instrument(err)]
in environments which use the span event's title for rendering purposes (making these events at best useless and at worst a significant issue for on-call engineers attempting to understand the system's behaviour).The heart of this problem comes down to two things:
tracing-opentelemetry
handles thestd::error::Error
type specially, capturing relevant error information for these values in the OpenTelemetry semantic convention event fieldsexception.type
,exception.message
andexception.stacktrace
(if available). To do this, it needs to receive the rawstd::error::Error
reference, however the only behavioural options available in#[instrument]
areDisplay
andDebug
formatting (which both result in aString
).#[instrument]
will cause the instrumented span to have anerror = (bool)
field populated and will emit a span event with anerror = (string)
field and no other metadata. This results in type-aware backends (like Honeycomb) erroneously coalescing the detailed string into atrue/false
value, hiding any useful information. It also means that backends which expect anotel.message
will present an empty event title, making this harder to read for an engineer.Solution
This PR attempts to overcome both of these problems in a relatively backwards compatible manner. To achieve this, it includes two changes:
err
andret
) calledRaw
. This format will pass the value through without mutating it in any way, allowing layers liketracing-opentelemetry
to handle them appropriately. In this case, that allows one to#[instrument(err(Raw))]
to ensure that accurate exception information is captured in the automatically generated span events.err
value (and/orret
value) such that it can be easily consumed by engineers when analyzing the trace events associated with a span.In practice, there are two areas where I'm marginally concerned about what the "best" way forward here is and ensuring backwards compatibility:
Raw
should likely be the default representation, with the option of usingDisplay
orDebug
being available if the type being recorded is not supported bytracing
(in much the same way as the current field annotations work). Perhaps this could even be achieved by using the same?err
and%err
macros we use today for fields, further consolidating the API and making it easier to pick-up. That said, this would be a breaking change for most implementations which currently avail of the transparentDisplay
/Debug
representations baked into#[instrument]
.#[instrument]
API to make this opt-in/opt-out, it's really challenging for someone to "fix" this without forking the project, and I suspect API expansions are likely not the right decision in the long-term.