Implement JMX E2E Tests in Terraform Framework #443

musa-asad · 2024-12-09T04:36:33Z

Description of the issue

The CloudWatch Agent now supports JMX metrics on EKS. We need to add testing to verify the behavior for JMX metrics collection in EKS environments.

Description of changes

Co-PR: aws/amazon-cloudwatch-agent#1463.

Implemented test/e2e/jmx/jmx_test.go with the following functions:
- ApplyHelm: Applies a helm release based on the specified resources passed in as variables, which sets up the CloudWatch Agent and deploys an annotated sample application.
- TestResources: Tests to see if the appropriate resources have been deployed on the EKS cluster.
- TestMetrics: Tests to see if metrics are being emitted properly based on the agent configuration file passed in.
Added test/e2e/jmx/resources/cwagent_configs/ and test/e2e/jmx/resources/sample_apps/ for custom agent configurations and sample applications to deploy.
- Added a check to see if tomcat.sessions and tomcat.rejected_sessions increases above 0 accordingly.
Added GetMetricMaximum to util/awsservice/cloudwatchmetrics.go in order to find the maximum value for a metric being emitted.
Added nodes to generator/test_case_generator.go and generator/resources/eks_e2e_test_matrix.json to be able to use this as a configurable value from the generated matrix.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

JVM & Tomcat

Kafka

Container Insights

test/e2e/jmx/jmx_test.go

dricross · 2024-12-17T20:43:25Z

test/e2e/jmx/jmx_test.go

+	if err != nil {
+		t.Fatalf("Error building kubeconfig: %v", err)
+	}


nit: should use testify's require / assert methods instead of t.Fatal / t.Error

Suggested change

if err != nil {

t.Fatalf("Error building kubeconfig: %v", err)

}

require.NoError(t, err, "Error building kubeconfig")

Thanks, I'll make this change

dricross · 2024-12-17T22:27:01Z

test/e2e/jmx/jmx_test.go

+	fmt.Println("Waiting for metrics to propagate...")
+	time.Sleep(5 * time.Minute)
+
+	os.Exit(m.Run())


Do the helm resources ever get cleaned up?

With the exception of load balancers (which I do remove), they all should be removed upon the deletion of the cluster, but I'll look into handling this more elegantly.

movence · 2024-12-18T03:04:52Z

test/e2e/jmx/jmx_test.go

+	helm := []string{
+		"helm", "upgrade", "--install", "amazon-cloudwatch-observability",
+		filepath.Join("..", "..", "..", "terraform", "eks", "e2e", "helm-charts", "charts", "amazon-cloudwatch-observability"),
+		"--values", filepath.Join("..", "..", "..", "terraform", "eks", "e2e", "helm-charts", "charts", "amazon-cloudwatch-observability", "values.yaml"),


do you need this line? I think helm will just use the values.yaml file from the chart path given above

Yeah, I can remove it. Thanks.

EDIT: When I removed it, the agent configuration didn't apply, so I added it back.

movence · 2024-12-18T03:07:00Z

test/e2e/jmx/jmx_test.go

+	}
+
+	fmt.Println("Waiting for CloudWatch Agent Operator to initialize...")
+	time.Sleep(300 * time.Second)


does opertor need 5m to start?

I'll lower the time.

movence · 2024-12-18T03:14:04Z

test/e2e/jmx/jmx_test.go

+		return fmt.Errorf("failed to apply sample app: %w\nOutput: %s", err, output)
+	}
+
+	wait := exec.Command("kubectl", "wait", "--for=condition=available", "--timeout=300s", fmt.Sprintf("deployment/%s", deploymentName), "-n", "default")


probably nit but it looks like sample apps will always be type of deployment then do we need to add -deployment suffix to sample apps yaml files? They are all deployment types anyways.

Yeah, I can remove that.

movence · 2024-12-18T03:58:17Z

test/e2e/jmx/jmx_test.go

+
+func testContainerInsightsMetrics(t *testing.T) {
+	t.Run("verify_containerinsights_metrics", func(t *testing.T) {
+		metricsToCheck := []struct {


is this an existing convention? i'm just curious if there will be a case where metrics are under different namespaces

+1. If the namespaces within each of these test functions isn't going to change, can we modify the code to just set it once in the function (maybe modifying the struct to get rid of namespace?)

Yeah, this is an existing convention, these metrics will report to this namespace.

varunch77 · 2024-12-18T15:40:02Z

test/e2e/jmx/jmx_test.go

+
+func testContainerInsightsMetrics(t *testing.T) {
+	t.Run("verify_containerinsights_metrics", func(t *testing.T) {
+		metricsToCheck := []struct {


+1. If the namespaces within each of these test functions isn't going to change, can we modify the code to just set it once in the function (maybe modifying the struct to get rid of namespace?)

lisguo · 2024-12-18T19:58:30Z

test/e2e/jmx/resources/sample_apps/kafka-deployment.yaml

+    spec:
+      containers:
+        - name: zookeeper
+          image: wurstmeister/zookeeper:latest


Instead of using latest, can we pin a version so that it never changes?

Or maybe better yet -- we push an image to our own ECR repo?

Good point, I'll add it to our own ECR repo.

lisguo · 2024-12-18T19:58:44Z

test/e2e/jmx/resources/sample_apps/kafka-deployment.yaml

+    spec:
+      containers:
+        - name: kafka
+          image: wurstmeister/kafka:latest


similar comment to above

lisguo · 2024-12-18T20:00:04Z

util/awsservice/cloudwatchmetrics.go

@@ -134,6 +134,78 @@ func GetMetricStatistics(
 	return CwmClient.GetMetricStatistics(ctx, &metricStatsInput)
 }

+func GetMetricMaximum(


Curious to hear the reasoning behind this func...is it because we don't know the exact value of the metric? But we expect the value to never be above a certain amount?

The reasoning behind this function was to get the highest metric value being emitted for tomcat.sessions / tomcat.rejected_sessions in order to check if that value is greater than zero since we need to verify we're able to get these metrics above zero. The reason I went for the maximum was just to be sure we check for a value that is absolutely expected to be above zero.

However, thinking through it, a more efficient solution would be to dedicate the function to check if it's above zero directly and exit out when we find that. I fixed the function to account for that.

As for the exact values for each metric, tomcat.sessions will max out to 2 since that's what I set it to in the application, but I didn't think it was necessary to directly check for this since the agent doesn't generate the metric, the application does. But it is important to make sure the agent is able to ship a non-zero value to CloudWatch. Another reason is a possible race condition where there are latency issues and it doesn't append to the right value in time compared to when we check the metric value, so it just seems more ideal to check if the value is above 0 instead of a specific value. tomcat.rejected_sessions is unpredictable since there isn't a max value for that, so it'll just keep increasing after the max sessions are hit.

musa-asad added 15 commits November 29, 2024 03:14

Preliminary testing.

6f320b0

Support environment.

5727809

Fix file positioning.

912d98e

Fix file positioning.

5d0362f

Fix values.

6740715

Fix otel and prometheus config set-up.

c401c0e

Remove jmx files.

e8d4901

Add jmx testing.

22186ed

Merge branch 'main' into e2e

7a02bc2

Allow to choose e2e or regular tests.

b5b2fb9

Merge remote-tracking branch 'origin/e2e' into e2e-jmx

7672929

Add tomcat sessions check.

9a830a0

Add TA.

5795103

Update jmx_test.go.

6a5f26e

Port foward and fix kafka.

c967e3d

musa-asad self-assigned this Dec 9, 2024

musa-asad added 14 commits December 8, 2024 23:40

Add CI.

63b6ba8

Add TA.

dfd94e0

Merge branch 'main' into e2e

da21e5a

Merge branch 'e2e' into e2e-jmx

82f7159

Update to 2h.

ad76db4

Remove port forwarding.

d918ffb

Fix providers.

987b005

Use loadbalancer.

9ea9eb2

Fix json.

6f8d7d1

Make 2 hours

ba53b14

Fix providers.tf

f5d2733

Merge branch 'e2e' into e2e-jmx

ee01d1b

Add auth for IAM role.

b6416ae

Verify cluster access.

5714a7e

musa-asad changed the base branch from e2e to main December 16, 2024 21:52

musa-asad added 5 commits December 16, 2024 16:58

Merge branch 'main' into e2e-jmx

a0c088e

Allow node configuration.

37dcf0b

Fix spacing.

cf4a54c

Add license.

58101e3

Check for max metric value.

0db3337

musa-asad requested review from jefchien and okankoAMZ December 17, 2024 04:11

musa-asad added 2 commits December 16, 2024 23:16

Simplify

803bad7

Fix imports.

3611226

musa-asad changed the title ~~JMX E2E Tests~~ Implement JMX E2E Tests in Terraform Framework Dec 17, 2024

Prevent test from running when it doesn't need to.

123f8e5

musa-asad marked this pull request as ready for review December 17, 2024 04:55

musa-asad requested a review from a team as a code owner December 17, 2024 04:55

musa-asad added 3 commits December 17, 2024 03:38

Update logic for max metric value.

9591142

Improve logic to check nodes being ran.

6fcde7e

Delete loadbalancer after complete

0533bfa

varunch77 requested changes Dec 17, 2024

View reviewed changes

test/e2e/jmx/jmx_test.go Outdated Show resolved Hide resolved

test/e2e/jmx/jmx_test.go Show resolved Hide resolved

musa-asad added 2 commits December 17, 2024 16:47

constant

c995fc7

constant

5efca7c

musa-asad requested a review from varunch77 December 17, 2024 22:25

dricross reviewed Dec 17, 2024

View reviewed changes

Merge branch 'main' into e2e-jmx

a10dc57

movence reviewed Dec 18, 2024

View reviewed changes

varunch77 requested changes Dec 18, 2024

View reviewed changes

lisguo reviewed Dec 18, 2024

View reviewed changes

musa-asad added 4 commits December 18, 2024 15:26

Merge branch 'main' into e2e-jmx

b9ae22b

address initial comments

74c44df

addressed comments

5d0af54

change generator

917a606

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement JMX E2E Tests in Terraform Framework #443

Implement JMX E2E Tests in Terraform Framework #443

musa-asad commented Dec 9, 2024 •

edited

Loading

dricross Dec 17, 2024

musa-asad Dec 18, 2024

dricross Dec 17, 2024

musa-asad Dec 18, 2024 •

edited

Loading

movence Dec 18, 2024

musa-asad Dec 18, 2024 •

edited

Loading

movence Dec 18, 2024

musa-asad Dec 18, 2024

movence Dec 18, 2024

musa-asad Dec 18, 2024 •

edited

Loading

movence Dec 18, 2024

varunch77 Dec 18, 2024

musa-asad Dec 18, 2024

varunch77 Dec 18, 2024

lisguo Dec 18, 2024

musa-asad Dec 18, 2024

lisguo Dec 18, 2024

musa-asad Dec 18, 2024

lisguo Dec 18, 2024

musa-asad Dec 18, 2024

Implement JMX E2E Tests in Terraform Framework #443

Are you sure you want to change the base?

Implement JMX E2E Tests in Terraform Framework #443

Conversation

musa-asad commented Dec 9, 2024 • edited Loading

Description of the issue

Description of changes

License

Tests

JVM & Tomcat

Kafka

Container Insights

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musa-asad Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musa-asad Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musa-asad Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

musa-asad commented Dec 9, 2024 •

edited

Loading

musa-asad Dec 18, 2024 •

edited

Loading

musa-asad Dec 18, 2024 •

edited

Loading

musa-asad Dec 18, 2024 •

edited

Loading