Limit OTLP metric histogram buckets to 100 #204
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Per AWS EMF specifications, metric histograms (numeric arrays) may contain no more than 100 values. Exceeding this results in invalid metric records. For the short term, we have decided that we will simply limit the number of histogram buckets to 100, to avoid this issue.
In this pull request, we are applying this limitation in
aws_opentelemetry_configurator
by passing in apreferred_aggregation
to theOTLPMetricExporter
s . However, we are also doing a lot more in this PR. We never used to pass this information to the exporters because in the OTEL Python SDK release we currently use, thepreferred_aggregation
was not used. To work around this, inaws_opentelemetry_distro
, we setOTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION
tobase2_exponential_bucket_histogram
so that ALL created histograms would use base2exponential. However, this would not allow us to modify the max number of buckets. Fortunately, the root issue was fixed in the upstream in this commit. In this PR, we are creating a patch forOTLPMetricExporter
that applies this commit - the diff should be identical except where we had to apply linters (which was separated out in a separate commit to make review easier). Because we no longer require the default histogram config, we have removed it fromaws_opentelemtry_distro
. A few other notes:test_instrumentation_patch.py
had to be significantly re-written for reasons that are made clear in the class-level comments.Testing Done
Modified AwsSpanMetricsProcessor and contract tests to generate data that will reliably produce 160 histogram buckets (essentially in AwsSpanMetricsProcessor, add a loop that adds a bunch of values that we saw trigger this issue, then print everything in contract tests).
Scenario 1: Without any changes (note 161 buckets):
Scenario 2: Without patch applied (note it's using explicit buckets because we removed OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION logic):
Scenario 3: With changes and patch (note 100 buckets):
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.