Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: Add a debug mode for tracing #7850

Merged
merged 7 commits into from
Nov 23, 2024

Conversation

cdecker
Copy link
Member

@cdecker cdecker commented Nov 21, 2024

Replaces an earlier attempt in #7847

Closes #7843

@cdecker cdecker force-pushed the 202447-trace-debug branch 2 times, most recently from d4ce0f9 to c3e2e98 Compare November 22, 2024 12:33
@cdecker cdecker changed the title Draft: common: Add a debug mode for tracing common: Add a debug mode for tracing Nov 22, 2024
@cdecker cdecker marked this pull request as ready for review November 22, 2024 12:36
After adding the DB query instrumentation we ran into a couple of
issues, with spans not being resumed correctly, and it was rather hard
to identify the problem. This adds debug statements so we can trace
the tracing (traception if you will).

Changelog-None
This was a bit harder to identify: during an `io_loop` run we suspend
the current span before handing over to `io_loop`, and later when a callback
is called we resume the span again. Depending on how we return from
the `io_loop` instance that is used to drive the startup, we either
have resumed the last span, or we don't. Since we start a span before
`io_loop` and want it to be emitted afterwards, we need to take care
of the case where we returned from a callback that did not resume, and
therefore the current context is empty.

Making `trace_span_resume` idempotent means we can just resume it
manually.

Ideally we'd push the suspend / resume logic down into `io_loop`
itself, and then we'd have just one place. Maybe suspend and resume
callbacks that can be configured in `io_loop`?
We have the space in memory set aside anyway, so let's just copy the
`trace_id` into the span itself, rather than resolving the `root` at
time of emission.
Trace spans form a tree, but we don't actually check that the
structure doesn't break. Breakage can for example come if we use the
same key accidentally, making a new span its own ancestor.
It turns out that under some circumstances we end up clearing the
pointee of `current` but not the pointer. Thus when we select the next
slot we can end up reusing the same slot, making it its own parent.

We forcefull break these cycles by enforcing that `current` should
never be returned and be set as its own parent.

Changelog-None
Just added a couple of compile-time guards and sprinkled the invariant
checking in a couple of places (disabled if compile time guard is
unset).
Copy link
Collaborator

@endothermicdev endothermicdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK cc7a826

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack cc7a826

@rustyrussell rustyrussell merged commit c596550 into ElementsProject:master Nov 23, 2024
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crash: trace_span_tag: Assertion `span' failed.
3 participants