Add TBS examples to explain policy naunces #1239

rubvs · 2025-04-22T21:37:13Z

Add two examples to TBS policy configurations:

How does policies relate between service definitions.
How does policies relate between trace and service definitions.

carsonip

thanks, some comments on explaining the implementation

carsonip · 2025-04-25T17:35:26Z

solutions/observability/apm/transaction-sampling.md

+- sample_rate: 1.0  # Fallback: always set a default
+```
+
+- Because Service A is the root of the trace, its policy (0.5) takes precedence over Service B's policy (0.3).


takes precedence

"takes precedence" implies some ordering. There is no ordering. It walks the policies from top to bottom, and match the first. service.name: B will never match. Service B is entirely out of the picture here.

carsonip · 2025-04-25T17:35:53Z

solutions/observability/apm/transaction-sampling.md

@@ -290,6 +290,50 @@ This example defines three tail-based sampling polices:
 2. Samples 1% of traces in `production` with the trace name `"GET /not_important_route"`
 3. Default policy to sample all remaining traces at 10%, e.g. traces in a different environment, like `dev`, or traces with any other name

+### Example configuration B [_example_configuration_b]
+
+When a trace originates in Service A and then calls Service B (without errors), the sampling rate is determined by the service where the trace starts:


(without errors)

with and without error is irrelevant here. Removing it may be clearer

carsonip · 2025-04-25T17:37:16Z

solutions/observability/apm/transaction-sampling.md

@@ -272,7 +272,7 @@ Trace events are matched to policies in the order specified. Each policy list mu
 Note that from version `9.0.0` APM Server has an unlimited storage limit, but will stop writing when the disk where the database resides reaches 80% usage. Due to how the limit is calculated and enforced, the actual disk space may still grow slightly over this disk usage based limit, or any configured storage limit.
 ::::

-### Example configuration [_example_configuration]
+### Example configuration A [_example_configuration_a]


I'm thinking about 1,2,3 instead of A,B,C since A,B,C are already used for service names.

carsonip · 2025-04-25T17:43:20Z

solutions/observability/apm/transaction-sampling.md

+- Because Service A is the root of the trace, its policy (0.5) takes precedence over Service B's policy (0.3).
+- If instead the trace began in Service B (and then passed to Service A), the policy for Service B would apply.
+
+> **Key point**: Tail‑based sampling rules are evaluated at the *trace level* based on where the trace was initiated, not on downstream spans (*service level*).


Suggested change

> **Key point**: Tail‑based sampling rules are evaluated at the *trace level* based on where the trace was initiated, not on downstream spans (*service level*).

> **Key point**: Tail‑based sampling rules are evaluated at the *trace level* based on which service initiated the distributed trace, not the service of the transaction or span.

carsonip · 2025-04-25T17:46:03Z

solutions/observability/apm/transaction-sampling.md

+
+Policies targeting the trace (e.g. `trace.outcome: failure`) apply across all services and should appear before more specific, service‑level rules if you want them to take precedence.
+
+> **Key point**: Define failure policy at the top to ensure capturing all failed traces, then define more specific policies for specific services to capture edge cases.


This is more like a opinion on how to use TBS, instead of explaining how Elastic APM picks the policy.

carsonip · 2025-04-25T17:46:34Z

solutions/observability/apm/transaction-sampling.md

+- In Example A, traces from Service A are sampled at 20%, and all other failed traces (regardless of service) are sampled at 50%.
+- In Example B, every failed trace is sampled at 20%, including those originating from Service A.
+
+Policies targeting the trace (e.g. `trace.outcome: failure`) apply across all services and should appear before more specific, service‑level rules if you want them to take precedence.


I wonder if it is just easier to explain how it is done in our implementation, that is we walk the array and pick the first policy that applies. If none applies, the fallback will be used.

carsonip · 2025-04-25T17:49:38Z

solutions/observability/apm/transaction-sampling.md

+
+### Example configuration C [_example_configuration_c]
+
+When you need to combine service‑specific policies with outcomes (e.g. failures), policy order defines specificity:


When you need to combine service‑specific policies with outcomes (e.g. failures), policy order defines specificity

The way I understand this sentence is there is something special to how we pick the policy when both service name and outcome are concerned. But it is not true. There is only 1 rule, that is we walk from top to bottom and pick the first applies.

I appreciate the example which makes it easier for users to understand how we pick the policy, but not in a way that implies special case handling in this sentence.

carsonip · 2025-04-25T17:53:52Z

also looping in @colleenmcginnis for 👀 on language and style

add TBS policy example to explain service order

4f4e3fa

github-actions bot deployed to docs-preview April 22, 2025 21:37 View deployment

add TBS policy example to explain trace order

edd9143

github-actions bot deployed to docs-preview April 22, 2025 21:50 View deployment

simplify and improve language

78f4ffb

github-actions bot deployed to docs-preview April 22, 2025 22:37 View deployment

rubvs marked this pull request as ready for review April 22, 2025 22:38

rubvs requested review from simitt and carsonip April 22, 2025 22:38

carsonip reviewed Apr 25, 2025

View reviewed changes

carsonip requested a review from colleenmcginnis April 25, 2025 17:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TBS examples to explain policy naunces #1239

Add TBS examples to explain policy naunces #1239

rubvs commented Apr 22, 2025 •

edited

Loading

carsonip left a comment

carsonip Apr 25, 2025

carsonip Apr 25, 2025

carsonip Apr 25, 2025

carsonip Apr 25, 2025

carsonip Apr 25, 2025

carsonip Apr 25, 2025

carsonip Apr 25, 2025

carsonip commented Apr 25, 2025

	> Key point: Tail‑based sampling rules are evaluated at the trace level based on where the trace was initiated, not on downstream spans (service level).
	> Key point: Tail‑based sampling rules are evaluated at the trace level based on which service initiated the distributed trace, not the service of the transaction or span.


		Policies targeting the trace (e.g. `trace.outcome: failure`) apply across all services and should appear before more specific, service‑level rules if you want them to take precedence.

		> Key point: Define failure policy at the top to ensure capturing all failed traces, then define more specific policies for specific services to capture edge cases.


		### Example configuration C [_example_configuration_c]

		When you need to combine service‑specific policies with outcomes (e.g. failures), policy order defines specificity:

Add TBS examples to explain policy naunces #1239

Are you sure you want to change the base?

Add TBS examples to explain policy naunces #1239

Conversation

rubvs commented Apr 22, 2025 • edited Loading

carsonip left a comment

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip Apr 25, 2025

Choose a reason for hiding this comment

carsonip commented Apr 25, 2025

rubvs commented Apr 22, 2025 •

edited

Loading