Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/tailsampling] record sampling policy #36312

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

djluck
Copy link

@djluck djluck commented Nov 12, 2024

Description

Adds support for optionally recording the policy (and any composite policy) associated with an inclusive tail processor sampling decision:

image

Link to tracking issue

Fixes
Resolves #35180.

Implementation notes

  • This functionality lives behind a feature flag that is disabled by default
  • The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set

Testing

  • Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic
  • Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations)

TODO

Does this require a CHANGELOG entry?

@djluck djluck requested review from jpkrohling and a team as code owners November 12, 2024 07:22
Copy link

linux-foundation-easycla bot commented Nov 12, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions bot added the processor/tailsampling Tail sampling processor label Nov 12, 2024
@djluck djluck force-pushed the tailsampling-record-policy branch from 0fff4e1 to 2ce41df Compare November 18, 2024 03:15
Copy link
Contributor

github-actions bot commented Dec 2, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 2, 2024
@djluck
Copy link
Author

djluck commented Dec 3, 2024

Hey @djaglowski, sorry to bother you but do you think you'd be able to take a look at this some time soon?

@github-actions github-actions bot removed the Stale label Dec 3, 2024
jpkrohling
jpkrohling previously approved these changes Dec 4, 2024
Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to set the policy at the root span, but I acknowledge that the root span might not be easy to find given the current structure we have. I think we can go with this first implementation and see what real-world effects it has on people.

@jpkrohling jpkrohling force-pushed the tailsampling-record-policy branch from 2ce41df to 08c761c Compare December 4, 2024 15:25
@jpkrohling jpkrohling dismissed their stale review December 4, 2024 15:31

hit "approve" too early :-)

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, it goes in the right direction, but it's missing unit tests for the newly introduced function, a changelog entry, and something in the README telling people how to use this.

@djluck
Copy link
Author

djluck commented Dec 5, 2024

Thanks for your feedback @jpkrohling, I'll work on resolving these comments over the next couple of days 👍

@djluck djluck force-pushed the tailsampling-record-policy branch 2 times, most recently from d6ceb1a to 966c3db Compare December 7, 2024 04:43
@djluck
Copy link
Author

djluck commented Dec 7, 2024

@jpkrohling thanks for the review, I've pushed fixes for all your comments. I think I missed Locking the tracedata mutex so I added this additionally.

Unsure why the pipeline is failing, seems unrelated to anything I touched (command missing error).

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK merging as is, the comments are nits.

@jpkrohling
Copy link
Member

The code owners check seems to be failing for all jobs at the moment.

@djluck djluck force-pushed the tailsampling-record-policy branch from 07966b6 to 37b15b3 Compare December 9, 2024 20:35
@djluck
Copy link
Author

djluck commented Dec 9, 2024

@jpkrohling thanks for the review, I've pushed fixes for your comments

@jpkrohling
Copy link
Member

Linting seems to be failing:

tailsamplingprocessor/factory.go:12: File is not `gci`-ed with --skip-generated -s standard -s default -s prefix(github.com/open-telemetry/opentelemetry-collector-contrib) (gci)
	"github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/telemetry"

tailsamplingprocessor/factory.go:18: File is not `gci`-ed with --skip-generated -s standard -s default -s prefix(github.com/open-telemetry/opentelemetry-collector-contrib) (gci)
	"github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/metadata"
tailsamplingprocessor/processor_decisions_test.go:12: File is not `gci`-ed with --skip-generated -s standard -s default -s prefix(github.com/open-telemetry/opentelemetry-collector-contrib) (gci)

You can run make lint locally to test the changes before going to the CI.

…olicy) associated with an inclusive tail processor sampling decision.

Resolves !35180.

- This functionality lives behind a feature flag that is disabled by default
- The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set

- Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic
- Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations)

Does this require a CHANGELOG entry?
- Added README entry for the feature flag
- Added missing mutex lock around reading trace data in `SetAttrOnScopeSpans`
- Added tests + benchmarks for `SetAttrOnScopeSpans`
…olicy) associated with an inclusive tail processor sampling decision.

Resolves !35180.

- This functionality lives behind a feature flag that is disabled by default
- The original issue described a solution where we might attach the attribute solely to the root span. I'm not sure I agree with the commenter that we can rely on this (e.g. we might decide to sample halfway through a long-running trace) so I have attached the attributes to all present scope spans. This feels like a decent trade off between complexity + network cost, as finding the highest non-root parent would require multiple passes of the spans and keeping all span ids in a set

- Added automated tests to verify enabling the flag both records the expected decision while not impacting existing logic
- Built a custom version and ran it in our preprod environment to ensure it was stable over a 1h period (still evaluating, will update PR with any further observations)

Does this require a CHANGELOG entry?
@djluck djluck force-pushed the tailsampling-record-policy branch from 171deff to 0584b17 Compare December 10, 2024 19:55
@djluck
Copy link
Author

djluck commented Dec 10, 2024

@jpkrohling apologies, re-ran make lint, make fmt + make gci- I think we should be good now

@djluck
Copy link
Author

djluck commented Dec 12, 2024

Thanks again @jpkrohling for reviewing this change but I think I need to bother you one last time: I can see you've approved but I'm not authorized to merge. Is that something you could do?

@djluck
Copy link
Author

djluck commented Dec 20, 2024

Hey @jpkrohling! Just a friendly ping- if you've got time before Christmas and you're able to help merge this it would be appreciated. Otherwise I can follow up in the new year.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/tailsampling Tail sampling processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add policy to spans in sampled traces
3 participants