Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement] Don't impose DO_NOT_SAMPLE on downstream services #170

Closed
wants to merge 1 commit into from

Conversation

iamdanfox
Copy link
Contributor

@iamdanfox iamdanfox commented Jun 7, 2019

Before this PR

If a service decides to start sampling a trace, it's important that all downstream service do the same to ensure we get a nice complete set of data in trace-log-viewer.

HOWEVER, if the TraceSampler decides not to sample a trace, it currently imposes this DO_NOT_SAMPLE decision on all downstream services too by setting the X-B3-Sampled header to 0

This means that theoretically if a single service early in a call graph has a low sampling rate, then it may unnecessarily reduce tracing information that all downstream services, even if they had plenty of capacity to record traces.

After this PR

If a service decides not to observe a span, it no longer forces all the downstream services to also drop the span, and instead allows downstream samplers to decide.

Longer-term, I'd also quite like to make our trace sampling to be a target logging rate (e.g. up to 1000 lines/second) rather than just a constant percentage of incoming requests. This should mean that infrequently accessed services could just sample every network request, thereby hopefully making life easier for developers.

Possible downsides

  • Rolling out this change will probably result in a higher proportion of traces getting sampled, as suddenly a request that passes through will have a chance to be sampled in every service, whereas previously the first failure would effectively blacklist that trace forever.
  • I also don't really understand why you would want to mark a trace as explicitly DO_NOT_SAMPLE, so maybe I'm missing something here.

Related to #32

@iamdanfox iamdanfox requested a review from a team as a code owner June 7, 2019 13:11
@ferozco
Copy link
Contributor

ferozco commented Jun 7, 2019

As you noted this change will lead to a dramatic increase in observed traces. I worry that it may have negative impact on the signals processing pipeline since we are now sampling a request with probability p = sum_i=0^n .98^i * .02 where n is the number of hops. For a request with 5 hops that translates to a sampling rate of 11.4%

@carterkozak
Copy link
Contributor

carterkozak commented Jun 7, 2019

I think this makes it more difficult and dangerous to configure the trace rate because of the consequences for all downstream services.

@iamdanfox
Copy link
Contributor Author

Ok fair enough let's park this one

@iamdanfox iamdanfox closed this Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants