Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(publish-metrics): otel tracing bug #2439

Merged
merged 5 commits into from
Jan 24, 2024

Conversation

InesNi
Copy link
Contributor

@InesNi InesNi commented Jan 24, 2024

Description

Fix for OTel traces arriving inconsistently (in number and format) to observability platforms when running in Fargate

Context

Two different issues were observed, and usually at least one would happen:

  1. All traces properly formed with the correct attributes and status, but some traces missing
  2. Traces not properly formed - missing parent root spans, incorrect span status (errors not set on spans)

Note: the issues would not happen on local runs, and issue 2. seems to only happen on Datadog.

The issues seem to be a result of multiple factors:

  • The wait for pending traces was not implemented in Playwright tracing possibly leading to triggering the traceProvider shut down prematurely.
  • In large tests with many traces the maxExportBatchSize and eventually the maxQueueSize could be reached in which case spans would be dropped

Solution

This PR addresses the mentioned factors that seem to be causing the issues:

  1. Waiting for pending traces is implemented in the Playwright tracing itself
  2. The defaults for maxExportBatchSize, maxQueueSize and the scheduledDelayMillis were increased, and the settings exposed to user for configuration if needed (this would allow users to adjust the number of traces that a batch can take before dropping them)
  3. Additional 5sec wait is added prior to shutting down the traceProvider in order to wait for the exporters flush.

@hassy
Copy link
Member

hassy commented Jan 24, 2024

Is there a way to reduce the flush period of the Datadog agent?

@InesNi InesNi marked this pull request as ready for review January 24, 2024 14:02
@InesNi InesNi merged commit c65efbc into main Jan 24, 2024
16 checks passed
@InesNi InesNi deleted the ifazlic-art-1575-fix-otel-tracing-bug branch January 24, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants