Split EMF log with larger than 100 buckets. #242

zzhlogin · 2024-10-18T20:52:21Z

Description:
In Application Signals, we utilize Base2 Exponential Bucket Histogram to aggregate and send latency data, with a default max number of buckets 160. In EMF exporter, these buckets are mapped to "Target members" in EMF log entries.
However, CloudWatch EMF logs impose a limit of 100 target members, beyond which EMF processors will mark the record as invalid, resulting in missing metrics and customer-facing errors reported via the EMFValidationErrors metric.

In this PR, we split histograms to two sub EMF logs with the following change:

Add an extra attribute metricIndex to groupedMetricMetadata : Current EMF exporter aggregate incoming metrics into groupedMetrics before converting to log events, where the groupKey is generated based on the groupedMetricMetadata including: metric namespace, timestamp, log group name, etc. After splitting, the two new metrics will share exactly the same key. Adding an extra metric metadata for key generation can prevent the second metric from dropping.
If the total buckets exceed 100, the exponential histogram metric are split into into multiple data points as needed,
each containing a maximum of 100 buckets, to comply with CloudWatch EMF log constraints.
For each split data point:

Min and Max values are recalculated based on the bucket boundary within that specific split.
Sum is only assigned to the first split to ensure the total sum of the datapoints after aggregation is correct.
Count is accumulated based on the bucket counts within each split.

Testing:
The change is tested by generating traffic with more than 100 buckets, and the emf log with larger than 100 values are eliminated after the change:

Also in E2E test, logging the Min/Max/Count to confirm the behaviour is expected.

bjrara

Thanks for the contribution. Some high level comments:

Intuitively, I think that the first split would contain as many datapoints as possible. Let's say in the future, if the EMF exporter is gonna support "transactional exporting", i.e. the second batch will only export if the first one succeeds. In this case, we want the majority to be exported in the first to reduce the risk of failures in the second. Switching the number should not impact how many data you would traverse in the splitting logics. As for now, it's not a big concern because each exporting is independent.
Shall we also include histogramDataPointSlice?
The test cases don't seem to cover enough use cases, e.g. with positive empty, with negative empty, with 0 empty, with non-splitting validation when the total size is smaller than 100. Please add more cases to verify the logics.
Regarding the only one test case you added, having a large array base for result validation is good, but how did we confirm that the expectedDatapoint is what we really want? It's hard to judge the accuracy.

exporter/awsemfexporter/datapoint.go

exporter/awsemfexporter/metric_translator.go

exporter/awsemfexporter/datapoint_test.go

zzhlogin · 2024-10-23T18:56:43Z

Thanks for the contribution. Some high level comments:

Intuitively, I think that the first split would contain as many datapoints as possible. Let's say in the future, if the EMF exporter is gonna support "transactional exporting", i.e. the second batch will only export if the first one succeeds. In this case, we want the majority to be exported in the first to reduce the risk of failures in the second. Switching the number should not impact how many data you would traverse in the splitting logics. As for now, it's not a big concern because each exporting is independent.

Updated the code to set the first bucket has in max on 100 values, and return the datapoint in the right order to avoid confusion.

Shall we also include histogramDataPointSlice?

The histogramDataPointSlice.CalculateDeltaDatapoints only return metrics Stats, including count, sum, max, min, there is no values and counts returned, the members won't exceed 100.

The test cases don't seem to cover enough use cases, e.g. with positive empty, with negative empty, with 0 empty, with non-splitting validation when the total size is smaller than 100. Please add more cases to verify the logics.

Added all these test cases. (The verification principle follows the explanation in 4. below)

Regarding the only one test case you added, having a large array base for result validation is good, but how did we confirm that the expectedDatapoint is what we really want? It's hard to judge the accuracy.

In the test case, we generate in total of 121 different buckets, with different number of values counts. The total count is 3662 (the result is calculated locally by iterating over all buckets), and the sum is hard coded to 1000 (to verify we directly use metric sum in the first split).

For the first split: Max is retrieved from Metric.Max value, and the Min is retrieved from the 100th bucket lower boundary (I confirmed by locally print out the 100th bucket and retrieve the value). It has exactly 100 values it is checked in expectedDatapoint1: value and Counts.
Similar For second split, Min is retrieved from Metric.Min value, Max is the same as first split's Min. It has exactly 21 values it is checked in expectedDatapoint2: value and Counts.

exporter/awsemfexporter/datapoint.go

exporter/awsemfexporter/metric_translator.go

exporter/awsemfexporter/datapoint.go

vastin

LGTM

zzhlogin added 2 commits October 18, 2024 19:52

Add test log.

cc3b303

code clean up.

4bb6b06

zzhlogin requested a review from mxiamxia as a code owner October 18, 2024 20:52

bjrara reviewed Oct 22, 2024

View reviewed changes

Address comments.

55e3ec0

jefchien reviewed Oct 23, 2024

View reviewed changes

zzhlogin added 3 commits October 29, 2024 21:15

Split to multiple datapoints according to input size.

656c777

Update comments.

2ca8776

Fix counter part.

58b03c3

bjrara reviewed Nov 6, 2024

View reviewed changes

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

jefchien reviewed Nov 6, 2024

View reviewed changes

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

Reformate code.

fc4d644

zzhlogin force-pushed the test-ecs branch from 032a752 to fc4d644 Compare November 7, 2024 18:55

bjrara reviewed Nov 7, 2024

View reviewed changes

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

reformate code.

4938954

zzhlogin force-pushed the test-ecs branch from c687b51 to 4938954 Compare November 7, 2024 23:48

bjrara approved these changes Nov 8, 2024

View reviewed changes

jefchien previously approved these changes Nov 8, 2024

View reviewed changes

Address lint.

768ac1a

zzhlogin dismissed jefchien’s stale review via 768ac1a November 9, 2024 00:05

vastin reviewed Nov 9, 2024

View reviewed changes

exporter/awsemfexporter/datapoint.go Outdated Show resolved Hide resolved

vastin approved these changes Nov 9, 2024

View reviewed changes

jefchien previously approved these changes Nov 11, 2024

View reviewed changes

Address comments.

a301f8f

zzhlogin dismissed jefchien’s stale review via a301f8f November 12, 2024 18:11

bjrara approved these changes Nov 12, 2024

View reviewed changes

vastin approved these changes Nov 12, 2024

View reviewed changes

jefchien previously approved these changes Nov 12, 2024

View reviewed changes

mxiamxia previously approved these changes Nov 12, 2024

View reviewed changes

eliminate zero splits.

8aa6806

zzhlogin dismissed stale reviews from mxiamxia and jefchien via 8aa6806 November 13, 2024 04:38

mxiamxia approved these changes Nov 13, 2024

View reviewed changes

jefchien approved these changes Nov 13, 2024

View reviewed changes

jefchien merged commit 9c1ddd2 into amazon-contributing:aws-cwa-dev Nov 20, 2024
141 of 146 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split EMF log with larger than 100 buckets. #242

Split EMF log with larger than 100 buckets. #242

zzhlogin commented Oct 18, 2024 •

edited

Loading

bjrara left a comment •

edited

Loading

zzhlogin commented Oct 23, 2024

vastin left a comment

Split EMF log with larger than 100 buckets. #242

Split EMF log with larger than 100 buckets. #242

Conversation

zzhlogin commented Oct 18, 2024 • edited Loading

bjrara left a comment • edited Loading

Choose a reason for hiding this comment

zzhlogin commented Oct 23, 2024

vastin left a comment

Choose a reason for hiding this comment

zzhlogin commented Oct 18, 2024 •

edited

Loading

bjrara left a comment •

edited

Loading