-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xray performance test #330
Conversation
Co-authored-by: name <[email protected]>
* Upgrade EKS Version and Fix EKS EMF Log JSON (aws#329) * Multi-System Trace Integration Test (aws#318) Co-authored-by: Jeffrey Chien <[email protected]> --------- Co-authored-by: Seth L <[email protected]> Co-authored-by: Jeffrey Chien <[email protected]>
Traces performance test
4770cee
to
a38cbff
Compare
24f8e38
to
6491d06
Compare
mockserver/README
Outdated
@@ -0,0 +1,3 @@ | |||
# Mocked Servers | |||
|
|||
The servers themselves are very basic. They listen for a message and tend to set a flag to success upon receiving it. Typically, they are set up with two servers: one to mock the backend and one to report on the status of that mock (if endpoint is called, how many times, etc.). There's a built-in 15ms of latency between each message that's received. The status reporter will always be on port 8080 and support the `:8080/` (returns "healthcheck") and the `:8080/check-data` (returns "success" if mock has received messages) endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - I don't think that this documentation is worded in a way that helps people understand how it's used. I think a more procedural guide on running the mocked tests, or a step-by-step guide on what happens would illustrate it better. Avoid words like "typically".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. There's room for improvement with the README.
c70faa8
to
de1f4ab
Compare
|
||
# Metric that the test needs to validate | ||
metric_namespace: "CloudWatchAgentPerformance" | ||
metric_validation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are we just validating that the metrics exist?
}, | ||
"insecure": true, | ||
"metrics": { | ||
"namespace": "CloudWatchAgentPerformance", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be in a different namespace since the metrics are specific to this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't for the other performance tests but I can change it
mockserver/main.go
Outdated
atomic.AddUint32(&ts.transactions, 1) | ||
|
||
// Built-in latency | ||
fmt.Printf("\033[31m Time: %d | data Received \033[0m \n", time.Now().Unix()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Mixture of fmt
and log
.
mockserver/main.go
Outdated
if t > 0 { | ||
message = SuccessMessage | ||
} | ||
fmt.Printf("\033[31m Time: %d | checkData msg: %s | %d\033[0m \n", time.Now().Unix(), message, t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this log necessary?
2c4c8c1
to
7f07122
Compare
terraform/performance/main.tf
Outdated
@@ -109,6 +109,13 @@ resource "null_resource" "validator_linux" { | |||
} | |||
provisioner "remote-exec" { | |||
inline = [ | |||
#mock server dependencies getting transfered. | |||
"git clone --branch ${var.github_test_repo_branch} ${var.github_test_repo}", | |||
"cd amazon-cloudwatch-agent-test && git checkout xray-performance-test", #@TODO remove before PR merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this TODO relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! i am waiting for the PR to be ready before I remove it so I can test the code before the merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the github_test_repo_branch
var. You can specify it in your test workflow.
util/common/traces/common/common.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be cognizant about the file structure we are introducing. Does this need to be util/common/traces/common/common
? What is the reason we are moving the file to begin with? Maybe a broader question, if it's supposed to be "common", why is it under /traces
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still would like an answer for this
terraform/performance/main.tf
Outdated
@@ -109,6 +109,13 @@ resource "null_resource" "validator_linux" { | |||
} | |||
provisioner "remote-exec" { | |||
inline = [ | |||
#mock server dependencies getting transfered. | |||
"git clone --branch ${var.github_test_repo_branch} ${var.github_test_repo}", | |||
"cd amazon-cloudwatch-agent-test && git checkout xray-performance-test", #@TODO remove before PR merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use the github_test_repo_branch
var. You can specify it in your test workflow.
96ade68
to
9b05e93
Compare
9b05e93
to
3651eea
Compare
util/common/traces/common/common.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still would like an answer for this
8f0f12a
to
a6ea8bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions, but nothing blocking.
mockserver/README
Outdated
@@ -0,0 +1,3 @@ | |||
# Mocked Servers | |||
|
|||
The servers themselves are very basic. They listen for a message and tend to set a flag to success upon receiving it. Typically, they are set up with two servers: one to mock the backend and one to report on the status of that mock (if endpoint is called, how many times, etc.). There's a built-in 15ms of latency between each message that's received. The status reporter will always be on port 8080 and support the `:8080/` (returns "healthcheck") and the `:8080/check-data` (returns "success" if mock has received messages) endpoints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. There's room for improvement with the README.
} | ||
|
||
type TransactionPayload struct { | ||
TransactionsPerMinute float64 `json:"GetNumberOfTransactionsPerMinute"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Try to keep the tag name and variable name the same.
} | ||
|
||
// Starts an HTTP server that receives request from validator only to verify the data ingestion | ||
func StartHttpServer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used elsewhere? Why isn't this just main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not opposed to having the start function separate from main. It's redundant here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using that for testing. I also like having a separate "main" function and have the argument flags in the func main instead.
a6ea8bc
to
8a81870
Compare
|
||
The verifier component can be accessed via a listener on port 8080. It provides information about the transactions, including: | ||
|
||
- **Transactions per Minute:** You can obtain the transactions per minute by making a request to `/tpm`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super mild nitpick, it would be nice to have an example cURL request, like
curl localhost:8080/tpm
# the output is [ something]
This is really trivial, imo, but something that is really helpful to document.
Description of the issue
We were missing an automated performance gathering for CloudWatchAgent's new feature, traces.
Description of changes
This PR is adding a performance test for xray traces by adding a mock http server to minimize wasted cloud resources. This is added to the already built in performance test, and does no require any user input. The test will create a mock server automatically and run trace generation on it.
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Ran Performance Test and checked on Dynamo
https://github.com/aws/private-amazon-cloudwatch-agent-staging/actions/runs/6223948452/job/17362419385