-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement JMX E2E Tests in Terraform Framework #443
base: main
Are you sure you want to change the base?
Conversation
test/e2e/jmx/jmx_test.go
Outdated
if err != nil { | ||
t.Fatalf("Error building kubeconfig: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should use testify's require
/ assert
methods instead of t.Fatal / t.Error
if err != nil { | |
t.Fatalf("Error building kubeconfig: %v", err) | |
} | |
require.NoError(t, err, "Error building kubeconfig") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll make this change
test/e2e/jmx/jmx_test.go
Outdated
fmt.Println("Waiting for metrics to propagate...") | ||
time.Sleep(5 * time.Minute) | ||
|
||
os.Exit(m.Run()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the helm resources ever get cleaned up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the exception of load balancers (which I do remove), they all should be removed upon the deletion of the cluster, but I'll look into handling this more elegantly.
test/e2e/jmx/jmx_test.go
Outdated
helm := []string{ | ||
"helm", "upgrade", "--install", "amazon-cloudwatch-observability", | ||
filepath.Join("..", "..", "..", "terraform", "eks", "e2e", "helm-charts", "charts", "amazon-cloudwatch-observability"), | ||
"--values", filepath.Join("..", "..", "..", "terraform", "eks", "e2e", "helm-charts", "charts", "amazon-cloudwatch-observability", "values.yaml"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this line? I think helm will just use the values.yaml file from the chart path given above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can remove it. Thanks.
EDIT: When I removed it, the agent configuration didn't apply, so I added it back.
test/e2e/jmx/jmx_test.go
Outdated
} | ||
|
||
fmt.Println("Waiting for CloudWatch Agent Operator to initialize...") | ||
time.Sleep(300 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does opertor need 5m to start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll lower the time.
test/e2e/jmx/jmx_test.go
Outdated
return fmt.Errorf("failed to apply sample app: %w\nOutput: %s", err, output) | ||
} | ||
|
||
wait := exec.Command("kubectl", "wait", "--for=condition=available", "--timeout=300s", fmt.Sprintf("deployment/%s", deploymentName), "-n", "default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably nit but it looks like sample apps will always be type of deployment
then do we need to add -deployment
suffix to sample apps yaml files? They are all deployment types anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can remove that.
test/e2e/jmx/jmx_test.go
Outdated
|
||
func testContainerInsightsMetrics(t *testing.T) { | ||
t.Run("verify_containerinsights_metrics", func(t *testing.T) { | ||
metricsToCheck := []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this an existing convention? i'm just curious if there will be a case where metrics are under different namespaces
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. If the namespaces within each of these test functions isn't going to change, can we modify the code to just set it once in the function (maybe modifying the struct to get rid of namespace
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is an existing convention, these metrics will report to this namespace.
test/e2e/jmx/jmx_test.go
Outdated
|
||
func testContainerInsightsMetrics(t *testing.T) { | ||
t.Run("verify_containerinsights_metrics", func(t *testing.T) { | ||
metricsToCheck := []struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. If the namespaces within each of these test functions isn't going to change, can we modify the code to just set it once in the function (maybe modifying the struct to get rid of namespace
?)
spec: | ||
containers: | ||
- name: zookeeper | ||
image: wurstmeister/zookeeper:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using latest, can we pin a version so that it never changes?
Or maybe better yet -- we push an image to our own ECR repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll add it to our own ECR repo.
spec: | ||
containers: | ||
- name: kafka | ||
image: wurstmeister/kafka:latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar comment to above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add.
util/awsservice/cloudwatchmetrics.go
Outdated
@@ -134,6 +134,78 @@ func GetMetricStatistics( | |||
return CwmClient.GetMetricStatistics(ctx, &metricStatsInput) | |||
} | |||
|
|||
func GetMetricMaximum( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to hear the reasoning behind this func...is it because we don't know the exact value of the metric? But we expect the value to never be above a certain amount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasoning behind this function was to get the highest metric value being emitted for tomcat.sessions
/ tomcat.rejected_sessions
in order to check if that value is greater than zero since we need to verify we're able to get these metrics above zero. The reason I went for the maximum was just to be sure we check for a value that is absolutely expected to be above zero.
However, thinking through it, a more efficient solution would be to dedicate the function to check if it's above zero directly and exit out when we find that. I fixed the function to account for that.
As for the exact values for each metric, tomcat.sessions
will max out to 2 since that's what I set it to in the application, but I didn't think it was necessary to directly check for this since the agent doesn't generate the metric, the application does. But it is important to make sure the agent is able to ship a non-zero value to CloudWatch. Another reason is a possible race condition where there are latency issues and it doesn't append to the right value in time compared to when we check the metric value, so it just seems more ideal to check if the value is above 0 instead of a specific value. tomcat.rejected_sessions
is unpredictable since there isn't a max value for that, so it'll just keep increasing after the max sessions are hit.
Description of the issue
The CloudWatch Agent now supports JMX metrics on EKS. We need to add testing to verify the behavior for JMX metrics collection in EKS environments.
Description of changes
Co-PR: aws/amazon-cloudwatch-agent#1463.
test/e2e/jmx/jmx_test.go
with the following functions:ApplyHelm
: Applies a helm release based on the specifiedresources
passed in as variables, which sets up the CloudWatch Agent and deploys an annotated sample application.TestResources
: Tests to see if the appropriate resources have been deployed on the EKS cluster.TestMetrics
: Tests to see if metrics are being emitted properly based on the agent configuration file passed in.test/e2e/jmx/resources/cwagent_configs/
andtest/e2e/jmx/resources/sample_apps/
for custom agent configurations and sample applications to deploy.tomcat.sessions
andtomcat.rejected_sessions
increases above 0 accordingly.GetMetricMaximum
toutil/awsservice/cloudwatchmetrics.go
in order to find the maximum value for a metric being emitted.generator/test_case_generator.go
andgenerator/resources/eks_e2e_test_matrix.json
to be able to use this as a configurable value from the generated matrix.License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
JVM & Tomcat
Kafka
Container Insights