-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split EMF log with larger than 100 buckets. #242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. Some high level comments:
- Intuitively, I think that the first split would contain as many datapoints as possible. Let's say in the future, if the EMF exporter is gonna support "transactional exporting", i.e. the second batch will only export if the first one succeeds. In this case, we want the majority to be exported in the first to reduce the risk of failures in the second. Switching the number should not impact how many data you would traverse in the splitting logics. As for now, it's not a big concern because each exporting is independent.
- Shall we also include
histogramDataPointSlice
? - The test cases don't seem to cover enough use cases, e.g. with positive empty, with negative empty, with 0 empty, with non-splitting validation when the total size is smaller than 100. Please add more cases to verify the logics.
- Regarding the only one test case you added, having a large array base for result validation is good, but how did we confirm that the
expectedDatapoint
is what we really want? It's hard to judge the accuracy.
Updated the code to set the first bucket has in max on 100 values, and return the datapoint in the right order to avoid confusion.
The
Added all these test cases. (The verification principle follows the explanation in 4. below)
In the test case, we generate in total of 121 different buckets, with different number of values counts. The total count is 3662 (the result is calculated locally by iterating over all buckets), and the sum is hard coded to 1000 (to verify we directly use metric sum in the first split).
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
9c1ddd2
into
amazon-contributing:aws-cwa-dev
Description:
In Application Signals, we utilize Base2 Exponential Bucket Histogram to aggregate and send latency data, with a default max number of buckets 160. In EMF exporter, these buckets are mapped to "Target members" in EMF log entries.
However, CloudWatch EMF logs impose a limit of 100 target members, beyond which EMF processors will mark the record as
invalid
, resulting in missing metrics and customer-facing errors reported via the EMFValidationErrors metric.In this PR, we split histograms to two sub EMF logs with the following change:
metricIndex
togroupedMetricMetadata
: Current EMF exporter aggregate incoming metrics into groupedMetrics before converting to log events, where the groupKey is generated based on the groupedMetricMetadata including: metric namespace, timestamp, log group name, etc. After splitting, the two new metrics will share exactly the same key. Adding an extra metric metadata for key generation can prevent the second metric from dropping.each containing a maximum of 100 buckets, to comply with CloudWatch EMF log constraints.
For each split data point:
Testing:
The change is tested by generating traffic with more than 100 buckets, and the emf log with larger than 100 values are eliminated after the change:
Also in E2E test, logging the Min/Max/Count to confirm the behaviour is expected.