Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result Widget has confusing number of virtual users #206

Open
6 tasks done
mcawilcox opened this issue Aug 18, 2024 · 6 comments
Open
6 tasks done

Result Widget has confusing number of virtual users #206

mcawilcox opened this issue Aug 18, 2024 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@mcawilcox
Copy link

mcawilcox commented Aug 18, 2024

Describe the bug
After a test run, the results include an image of the main test parameters from a cloudwatch widget (for me, related to just region eu-west-2).
In a calibration run, setting the concurrency to 10, I expect to see this in the metrics as a nice steady line of 10, preceeded by a steady ramp.
Instead I get a line that jumps around, but is of the order of 200.

The logfile from Taurus consistently logs "10 vu" after the initial ramp up, but the logging interval varies from 5s down to 2s.

If I examine cloudwatch directly, I can simulate the view presented amongst the results, when the "virtual users" statistic is set to Sum.
I can get the correct graph by changing the statistic to "Average", "Minimum" or "Maximum".

"Sum" is the wrong statistic to use for VU, as there are multiple samples per minute. It is correct for the "Successes" and "Failures" counts.

However, once corrected, the "Virtual Users" count does not have the right size/scale to be properly visible using the right-hand y-axis (much smaller value than "Successes"; I suggest that the result be made into two graphs - but I'm not sure if cloudwatch can generate a single widget in this manner.

I suggest a fix around line 403 in results-parser/lib/parser/index.js from:

key !== "avgRt" && (metricOptions[key].stat = "Sum");

to:

key !== "avgRt" && key !== "numVu" && (metricOptions[key].stat = "Sum");

but I'm not able to test, and I'm not sure of the impact to the other image, which brings me to the final point...

I am testing with a single region, and can only see the results image for that region. I can see that DLT has generated a "total" image as well, and I can see that the code changes some of the metric calculations ... but I can't get the DLT web GUI to display that "total" image.

To Reproduce

  1. Deploy DLT using CloudFormation
  2. Configure a test with a task count of 1, concurrency of 10, region of eu-west-2, ramp of 5, hold of 90, and a task type of Jmeter.
  3. Upload Jmeter script where there is a single thread group, with number of threads 1, ramp up 1 and loop count 1.
    My test script happens to generate 23,000 requests in the 95 minutes (240 requests/min)
  4. Let test run
  5. Observe the Test Result page, especially the image on the lower-right of the panel.

Expected behavior

Please complete the following information about the solution:

  • Version: v3.2.10
  • Region: eu-west-2
  • Was the solution modified from the version published on this repository? No
  • If the answer to the previous question was yes, are the changes available on GitHub? n/a
  • Have you checked your service quotas for the services this solution uses? No, but only running single calibration tests currently
  • Were there any errors in the CloudWatch Logs? No

Screenshots

  1. Original widget image, with virtual users in blue
    01-TestResult-BadVU
  2. The result-parser lambda logs a widget description into the logfiles, so I used this to create a cloudwatch widget.
    There is nothing visible because it sets the period to 10s
    02-CW-Metrics-ReplicatedFromResultLambda
  3. So I changed the period to 1 minute. Note the statistic for "Virtual Users" is set to Sum, and this graph matches the original.
    03-CW-Metrics-ReplicatedNowVisible
  4. Sum is bad to use when the samples aren't once per minute. This shows the number of samples
    04-CW-Metrics-WithSampleCount
  5. Here, I fixed the statistic, but it is now hard to see as it is the wrong scale for the y-axis
    05-CW-Metrics-FixedButHardToSee
  6. In this graph, I make the "Virtual Users" value more visible by multiplying by 10, but that value depends on the details of the test case.
    06-CW-Metrics-FixedAndVisible
  7. Better would be to display the Users on a third y-axis, or like this as a separate graph with y-axis labelled for users
    07-CW-Metrics-FixedGraph-Users
  8. and a graph with y-axis labelled for Requests per Minute
    08-CW-Metrics-FixedGraph-Hits

Additional context

@mcawilcox mcawilcox added the bug Something isn't working label Aug 18, 2024
@mcawilcox
Copy link
Author

I've added a snippet from the Taurus logs:
09-TaurusLogSnippet

@mcawilcox
Copy link
Author

I found an example in Taurus of them separating the two result graphs for Hits and Response Times
Taurus Reporting Example

I've made some changes to my widgets to emulate these two graphs:

  1. I've used a stacked area graph for this one:
    Taurus-LoadGraph
  2. I've used a line graph for this:
    Taurus-ResponseTimeGraph

@mcawilcox
Copy link
Author

mcawilcox commented Aug 18, 2024

  1. Inspired by that page, can I suggest an enhancement? Sometimes showing the average latency is good, but sometimes seeing p90 would be good:
    Taurus-ResponseTimeWithp90
  2. Or p95:
    Taurus-ResponseTimeWithp95
  3. If the enhancement can't be done on the static Test Results page, can a customisable widget be left in Cloudwatch (perhaps part of the dashboard) that would allow some of these extra lines to be graphed, allowing us to generate our own images for test reports?

@mcawilcox
Copy link
Author

Addition:
I realised the Cloudwatch live dashboard has the same underlying issue - it performs sum(@numVu) too - but this one mostly works because the full log insights parser is "stat sum(@numVu) by bin(1s)" ... and most of the time the bin(1s) ensures only a single sample matches, so sum()==avg().

I do see occasional glitches where the graph doubles ... so sometimes there are 2 samples per second.
10-CW-Live-UsersGlitch

Again, using avg(), min() or max() works.

@kamyarz-aws
Copy link
Member

This is very comprehensive. Thanks for the analysis. I will go over it and update you on this.

@kamyarz-aws kamyarz-aws self-assigned this Aug 19, 2024
@mcawilcox
Copy link
Author

Addition:
I did all my original analysis using a single load engine, which meant that use of the "Average" statistic worked out well.
Since then, I started to scale my tests beyond a single load engine, and realised that the "Average" statistic no longer works - there needs to be something that knows how many engines are running in parallel.

As a quick hack in my own metrics, I added a line for "engines" as "TIME_SERIES(4)" when I have 4 tasks, and then defined the Virtual Users to be "AVG([numVu0]) * engines"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants