-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/awsemfexporter]Split EMF log with larger than 100 buckets. #36336
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the PR checks
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Adding @Aneurysm9 for help on review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the benchmarks? I don't see them in the PR. The benchstat output is also missing the after and comparison data.
Sorry, failed to execute the comparison command before, added the after.txt output in the description now. The benchmark test is located in file "exporter/awsemfexporter/datapoint_test.go" at line 2075, 2076. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads to test failures, see #36727
…uckets." (open-telemetry#36763) Reverts open-telemetry#36336 leads to test failures, see open-telemetry#36727
… buckets." (#36771) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR fix the flaky unit test in previous PR: #36336, and add back the implementation of splitting the emf log logic. <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue #36727 <!--Describe what testing was performed and which tests were added.--> #### Testing Unit test updated and passed with 10 count: ``` go test -run TestAddToGroupedMetric -count 10 -tags=always PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter 0.016s ```
… buckets." (open-telemetry#36771) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR fix the flaky unit test in previous PR: open-telemetry#36336, and add back the implementation of splitting the emf log logic. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue open-telemetry#36727 <!--Describe what testing was performed and which tests were added.--> #### Testing Unit test updated and passed with 10 count: ``` go test -run TestAddToGroupedMetric -count 10 -tags=always PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/exporter/awsemfexporter 0.016s ```
Description:
In Application Signals, we utilize Base2 Exponential Bucket Histogram to aggregate and send latency data, with a default max number of buckets 160. In EMF exporter, these buckets are mapped to "Target members" in EMF log entries.
However, CloudWatch EMF logs impose a limit of 100 target members, beyond which EMF processors will mark the record as
invalid
, resulting in missing metrics and customer-facing errors reported via the EMFValidationErrors metric.In this PR, we split histograms to two sub EMF logs with the following change:
metricIndex
togroupedMetricMetadata
: Current EMF exporter aggregate incoming metrics into groupedMetrics before converting to log events, where the groupKey is generated based on the groupedMetricMetadata including: metric namespace, timestamp, log group name, etc. After splitting, the two new metrics will share exactly the same key. Adding an extra metric metadata for key generation can prevent the second metric from dropping.each containing a maximum of 100 buckets, to comply with CloudWatch EMF log constraints.
For each split data point:
Testing:
The change is tested by generating traffic with more than 100 buckets, and the emf log with larger than 100 values are eliminated after the change:
Compare the added Benchmark test before vs after the code change:
Benchmark test with 100 bucket length:
Benchmark test with 200 bucket length:
Benchmark test with 300 bucket length:
Benchmark test with 500 bucket length: