Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvement(perf): add validation rules for latency decorator #9295

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

soyacz
Copy link
Contributor

@soyacz soyacz commented Nov 20, 2024

Added validation rules for results sent by
latency_calculator_decorator to Argus.
Each workload and result name (nemesis, predefined step) may set own rules.

Current rules were created based on existing results - to pass typical good results.

closes: #9237

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

sdcm/argus_results.py Outdated Show resolved Hide resolved
sdcm/argus_results.py Outdated Show resolved Hide resolved
@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch from 09a3d76 to bfbe9d1 Compare November 21, 2024 16:10
@soyacz soyacz marked this pull request as draft November 21, 2024 16:14
@soyacz
Copy link
Contributor Author

soyacz commented Nov 21, 2024

@fruch I adjusted the code to configure it by config file. Converted to draft as I didn't add error thresholds for OSS yet.

sdcm/sct_config.py Outdated Show resolved Hide resolved
defaults/test_default.yaml Outdated Show resolved Hide resolved
@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch from bfbe9d1 to 2530138 Compare November 22, 2024 09:18
@fruch
Copy link
Contributor

fruch commented Nov 25, 2024

we are missing a configuration for the the upgrade cases

@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch 2 times, most recently from 733c652 to c0a7676 Compare November 25, 2024 17:53
@soyacz
Copy link
Contributor Author

soyacz commented Nov 26, 2024

I verified predefined steps test (with null'ed validation rules for latencies in unthrottled) - all seem to work (except one small issue with Argus: https://argus.scylladb.com/tests/scylla-cluster-tests/9e2af03d-b1a5-4df0-b516-4ce5e624586d)
Besides, due #9294 I think of taking out default throughput verifications - until this one is closed (not to generate false errors).
I'll prepare this PR for final review (and construct rules for tablets scenarios).
Defaults should be good for upgrade, until we want to verify durations for these - @fruch @juliayakovlev let me know if I should add it).

@@ -1,12 +1,12 @@
test_duration: 3000
prepare_write_cmd: ["cassandra-stress write no-warmup cl=ALL n=162500000 -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3)' -mode cql3 native -rate threads=200 -col 'size=FIXED(128) n=FIXED(8)' -pop seq=1..162500000",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminder, taking this out

@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch 3 times, most recently from 588f762 to 5e07743 Compare November 26, 2024 10:25
@soyacz soyacz marked this pull request as ready for review November 26, 2024 10:28
@soyacz
Copy link
Contributor Author

soyacz commented Nov 26, 2024

@fruch @juliayakovlev I think it's ready for review. All duration/latency error thresholds I based on graphs - mostly to make them passing. I think fine tuning them may be done later on perf weekly meetings, when graphs show them.

@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch from 5e07743 to 993c44c Compare November 26, 2024 15:49
fruch
fruch previously approved these changes Nov 27, 2024
Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

but let's wait for @roydahan and @juliayakovlev to cross check the figures

@soyacz
Copy link
Contributor Author

soyacz commented Dec 1, 2024

@roydahan @juliayakovlev ^^

@@ -0,0 +1,98 @@
latency_decorator_error_thresholds:
write:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First step in write test is 200000

@@ -0,0 +1,98 @@
latency_decorator_error_thresholds:
write:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First step in write test is 200000

Copy link
Contributor

@juliayakovlev juliayakovlev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comments

@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch 2 times, most recently from 8a72dfa to 4adf310 Compare December 2, 2024 12:41
@soyacz
Copy link
Contributor Author

soyacz commented Dec 2, 2024

small comments

generally, if something is not provided then defaults are used. But I added it for clarity.
What I'm most concerned if my values are correct - if I wasn't too delicate for scylla and bars should be marked lower at some places (especially for durations).

@soyacz soyacz requested a review from juliayakovlev December 3, 2024 14:09
@fruch
Copy link
Contributor

fruch commented Dec 3, 2024

@soyacz

you have a small conflict here

Added validation rules for results sent by
`latency_calculator_decorator` to Argus.
Each workload and result name (nemesis, predefined step) may set own
rules.

Current rules were created based on existing results - to pass typical
good results.

closes: scylladb#9237
@soyacz soyacz force-pushed the add-limits-to-latency-decorator branch from 4adf310 to ca452ea Compare December 3, 2024 15:54
@soyacz
Copy link
Contributor Author

soyacz commented Dec 3, 2024

@soyacz

you have a small conflict here

fixed

P90 write:
fixed_limit: 1000
P99 write:
fixed_limit: 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What units are these numbers?
What is 1000? ms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it way too high?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sit together and define the numbers we want to define here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Latency decorator error thresholds
4 participants