Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Middleware to containerinsightrecieve/resourcedetection #265

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

Paramadon
Copy link

@Paramadon Paramadon commented Dec 9, 2024


Description of the Issue

The CloudWatch Agent lacked an agenthealth configuration to monitor API health by tracking HTTP status codes for specific responses. Monitoring the status codes (200, 400, 408, 419, and 429) across all APIs is critical for diagnosing issues and ensuring comprehensive observability of API behaviors. Without this configuration, users were unable to quickly identify trends in API health metrics or correlate specific status codes with performance issues.


Changes Made

  • Added agenthealth Configuration:

    • Configured the CloudWatch Agent to track the following status codes across all APIs:
      • 200: Success
      • 400: Bad Request
      • 408: Request Timeout
      • 419: Authentication Timeout
      • 429: Too Many Requests
    • This ensures that health metrics for these common response codes are captured and reported.
  • Updated CloudWatch Agent Configuration JSON:

    • Introduced a new section under the metrics collection configuration to specify agenthealth metrics.
    • Ensured compatibility with existing metrics and logs collection by integrating the new configuration seamlessly.
  • Validated the Configuration:

    • Tested the updated configuration with the CloudWatch Agent to ensure the new metrics are collected and reported correctly.
    • Confirmed that the metrics appear as expected in CloudWatch for all specified APIs.

Impact

  • Provides real-time visibility into API status codes, helping identify anomalies or patterns that indicate issues.
  • Enhances observability for critical services by adding detailed health metrics.

Testing

  • Deployed the updated CloudWatch Agent health configuraion
  • Verified that metrics for status codes 200, 400, 408, 419, and 429 are captured in the header as you can see from the image below:
Screenshot 2024-11-15 at 4 36 21 PM

Tested on EKS by changing agent contrib to this branch.
Ex: replace github.com/open-telemetry/opentelemetry-collector-contrib/processor/resourcedetectionprocessor => github.com/amazon-contributing/opentelemetry-collector-contrib/processor/resourcedetectionprocessor v0.0.0-20241212025412-4bd7a1e9deed

Screenshot 2024-12-16 at 7 17 54 PM

@Paramadon Paramadon requested a review from mxiamxia as a code owner December 9, 2024 00:49
@Paramadon Paramadon changed the title adding handlers to containerinsightreciever and resourcedetectionproc… Adding Middleware to containerinsightrecieve/resourcedetection Dec 10, 2024
Copy link

@dricross dricross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have at least one unit test per client where the configurer is not nil to make sure it gets invoked. You can make the configurer something simple that sets a boolean and the test should check that the boolean is set by the configurer.

ec2MetadataCreator: newEC2Metadata,
ebsVolumeCreator: newEBSVolume,
ec2TagsCreator: newEC2Tags,
ec2MetadataCreator: withConfigurer(configurer, func(c *awsmiddleware.Configurer) func(context.Context, *session.Session, time.Duration, chan bool, chan bool, bool, int, *zap.Logger, ...ec2MetadataOption) ec2MetadataProvider {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what is happening here:
a. The configurer is passed to withConfigurer.

b. A function is defined that takes a *awsmiddleware.Configurer and returns another function. This returned function is the actual EC2 metadata creator function.

c. The inner function (the EC2 metadata creator) now has access to the Configurer (as c) when it's called, allowing it to use the configurer when creating the EC2 metadata provider.

d. The newEC2Metadata function is called with all the parameters plus the Configurer (c).

This pattern allows the Configurer to be injected into the EC2 metadata creation process without changing the signature of the ec2MetadataCreator function that's stored in the Info struct.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to break that line down and explain whats going on using helper functions, because currently you it is difficult to understand whats going on there without coming to this PR and finding this comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing with the return functions down below.

@musa-asad
Copy link

musa-asad commented Dec 11, 2024

Can you update the description to describe how it differs between #265 just for clarity more than just the title?

dricross
dricross previously approved these changes Dec 12, 2024
okankoAMZ
okankoAMZ previously approved these changes Dec 16, 2024
Copy link

@okankoAMZ okankoAMZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

internal/metadataproviders/aws/ec2/metadata.go Outdated Show resolved Hide resolved
processor/resourcedetectionprocessor/go.mod Outdated Show resolved Hide resolved
Comment on lines 127 to 129
func (d *Detector) ExposeHandlers(ctx context.Context) (handlers *request.Handlers) {
return d.metadataProvider.GetHandlers(ctx)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: For consistency and to make it easier to follow, keep the names of the functions used to access the request handlers the same. Would suggest keeping it the same as the internal/aws/cwlogs client.

func (client *Client) Handlers() *request.Handlers {
return &client.svc.(*cloudwatchlogs.CloudWatchLogs).Handlers
}

func (p *ResourceProvider) ConfigureHandlers(ctx context.Context, host component.Host, middlewareId component.ID) {
for _, detector := range p.detectors {
if handlerDetector, ok := detector.(ExposeHandlerDetector); ok {
awsmiddleware.TryConfigure(p.logger, host, middlewareId, awsmiddleware.SDKv1(handlerDetector.ExposeHandlers(ctx)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could grab the Configurer once and pass that into this function instead of the host and middlewareID so it isn't trying to get it each time.

@Paramadon Paramadon merged commit 2dd9562 into aws-cwa-dev Dec 16, 2024
125 of 146 checks passed
Comment on lines +128 to +134
func createEC2MetadataCreator(configurer *awsmiddleware.Configurer) func(context.Context, *session.Session, time.Duration, chan bool, chan bool, bool, int, *zap.Logger, ...ec2MetadataOption) ec2MetadataProvider {
return withConfigurer(configurer, func(c *awsmiddleware.Configurer) func(context.Context, *session.Session, time.Duration, chan bool, chan bool, bool, int, *zap.Logger, ...ec2MetadataOption) ec2MetadataProvider {
return func(ctx context.Context, session *session.Session, refreshInterval time.Duration, instanceIDReadyC, instanceIPReadyC chan bool, localMode bool, imdsRetries int, logger *zap.Logger, options ...ec2MetadataOption) ec2MetadataProvider {
return newEC2Metadata(ctx, session, refreshInterval, instanceIDReadyC, instanceIPReadyC, localMode, imdsRetries, logger, c, options...)
}
})
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need any of this? Can't we just change the functions to take in the configurer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants