Skip to content

Commit

Permalink
improvement(perf): add validation rules for latency decorator
Browse files Browse the repository at this point in the history
Added validation rules for results sent by
`latency_calculator_decorator` to Argus.
Each workload and result name (nemesis, predefined step) may set own
rules.

Current rules were created based on existing results - to pass typical
good results.

closes: scylladb#9237
  • Loading branch information
soyacz committed Nov 26, 2024
1 parent 57e5dd0 commit 993c44c
Show file tree
Hide file tree
Showing 21 changed files with 406 additions and 25 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
latency_decorator_error_thresholds:
write:
_mgmt_repair_cli:
duration:
fixed_limit: 10800
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 2500
decommission_nodes:
duration:
fixed_limit: 1800
replace_node:
duration:
fixed_limit: 3600

read:
_mgmt_repair_cli:
duration:
fixed_limit: 3600
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 3000
decommission_nodes:
duration:
fixed_limit: 1800
replace_node:
duration:
fixed_limit: 3000

mixed:
_mgmt_repair_cli:
duration:
fixed_limit: 5000
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 2500
decommission_nodes:
duration:
fixed_limit: 1800
replace_node:
duration:
fixed_limit: 3200
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
latency_decorator_error_thresholds:
write:
_mgmt_repair_cli:
duration:
fixed_limit: 10800 # 3h this one typically will fail
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 7200
decommission_nodes:
duration:
fixed_limit: 8000
replace_node:
duration:
fixed_limit: 3600

read:
_mgmt_repair_cli:
duration:
fixed_limit: 10800 # 3h this one typically will fail
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 2600
decommission_nodes:
duration:
fixed_limit: 2600
replace_node:
duration:
fixed_limit: 2600

mixed:
_mgmt_repair_cli:
duration:
fixed_limit: 10800 # 3h this one typically will fail
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 2800
decommission_nodes:
duration:
fixed_limit: 2600
replace_node:
duration:
fixed_limit: 2600
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
latency_decorator_error_thresholds:
write:
_mgmt_repair_cli:
duration:
fixed_limit: 28000
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 10000
decommission_nodes:
duration:
fixed_limit: 10000
replace_node:
duration:
fixed_limit: 2500

read:
_mgmt_repair_cli:
duration:
fixed_limit: 3000
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 2500
decommission_nodes:
duration:
fixed_limit: 2800
replace_node:
duration:
fixed_limit: 1500

mixed:
_mgmt_repair_cli:
duration:
fixed_limit: 5000
_terminate_and_wait:
duration:
fixed_limit: 500
add_new_nodes:
duration:
fixed_limit: 3000
decommission_nodes:
duration:
fixed_limit: 3600
replace_node:
duration:
fixed_limit: 1700
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
latency_decorator_error_thresholds:
write:
"300000":
P99 write:
fixed_limit: 1000
"400000":
P99 write:
fixed_limit: 1000
unthrottled:
P90 write:
fixed_limit: null
P99 write:
fixed_limit: null
Throughput write:
best_pct: 10

read:
"150000":
P90 read:
fixed_limit: 1
P99 read:
fixed_limit: 1
"300000":
P90 read:
fixed_limit: 1
P99 read:
fixed_limit: 1
"450000":
P90 read:
fixed_limit: 1
P99 read:
fixed_limit: 3
"600000":
P90 read:
fixed_limit: 1.5
P99 read:
fixed_limit: 50
"700000":
P90 read:
fixed_limit: 3
P99 read:
fixed_limit: 50
unthrottled:
P90 read:
fixed_limit: null
P99 read:
fixed_limit: null
Throughput read:
best_pct: 10

mixed:
"50000":
P90 write:
fixed_limit: 1
P90 read:
fixed_limit: 1
P99 write:
fixed_limit: 2.5
P99 read:
fixed_limit: 2.5
"150000":
P90 write:
fixed_limit: 1
P90 read:
fixed_limit: 1.7
P99 write:
fixed_limit: 3
P99 read:
fixed_limit: 3
"300000":
P90 write:
fixed_limit: 3
P90 read:
fixed_limit: 3
P99 write:
fixed_limit: 5
P99 read:
fixed_limit: 5
"450000":
P90 write:
fixed_limit: 3
P90 read:
fixed_limit: 4
P99 write:
fixed_limit: 15
P99 read:
fixed_limit: 15
unthrottled:
P90 write:
fixed_limit: null
P90 read:
fixed_limit: null
P99 write:
fixed_limit: null
P99 read:
fixed_limit: null
Throughput write:
best_pct: 10
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
latency_decorator_error_thresholds:
write:
"300000":
P99 write:
fixed_limit: 1000
"400000":
P99 write:
fixed_limit: 1000
unthrottled:
P90 write:
fixed_limit: null
P99 write:
fixed_limit: null
Throughput write:
best_pct: 10

read:
"150000":
P90 read:
fixed_limit: 1
P99 read:
fixed_limit: 1
"300000":
P90 read:
fixed_limit: 1
P99 read:
fixed_limit: 1
"450000":
P90 read:
fixed_limit: 1
P99 read:
fixed_limit: 3
"600000":
P90 read:
fixed_limit: 1.5
P99 read:
fixed_limit: 50
"700000":
P90 read:
fixed_limit: 3
P99 read:
fixed_limit: 50
unthrottled:
P90 read:
fixed_limit: null
P99 read:
fixed_limit: null
Throughput read:
best_pct: 10

mixed:
"50000":
P90 write:
fixed_limit: 1
P90 read:
fixed_limit: 1
P99 write:
fixed_limit: 2.5
P99 read:
fixed_limit: 2.5
"150000":
P90 write:
fixed_limit: 1
P90 read:
fixed_limit: 1.7
P99 write:
fixed_limit: 3
P99 read:
fixed_limit: 3
"300000":
P90 write:
fixed_limit: 3
P90 read:
fixed_limit: 3
P99 write:
fixed_limit: 5
P99 read:
fixed_limit: 5
"450000":
P90 write:
fixed_limit: 3
P90 read:
fixed_limit: 4
P99 write:
fixed_limit: 15
P99 read:
fixed_limit: 15
unthrottled:
P90 write:
fixed_limit: null
P90 read:
fixed_limit: null
P99 write:
fixed_limit: null
P99 read:
fixed_limit: null
Throughput write:
best_pct: 10
24 changes: 24 additions & 0 deletions defaults/test_default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -264,3 +264,27 @@ skip_test_stages: {}
n_db_zero_token_nodes: 0
zero_token_instance_type_db: 'i4i.large'
use_zero_nodes: false

latency_decorator_error_thresholds:
write:
default:
P90 write:
fixed_limit: 5
P99 write:
fixed_limit: 10
read:
default:
P90 read:
fixed_limit: 5
P99 read:
fixed_limit: 10
mixed:
default:
P90 write:
fixed_limit: 5
P90 read:
fixed_limit: 5
P99 write:
fixed_limit: 10
P99 read:
fixed_limit: 10
1 change: 1 addition & 0 deletions docs/configuration_options.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,3 +387,4 @@
| **<a href="#user-content-use_zero_nodes" name="use_zero_nodes">use_zero_nodes</a>** | If True, enable support in sct of zero nodes(configuration, nemesis) | N/A | SCT_USE_ZERO_NODES
| **<a href="#user-content-n_db_zero_token_nodes" name="n_db_zero_token_nodes">n_db_zero_token_nodes</a>** | Number of zero token nodes in cluster. Value should be set as "0 1 1"<br>for multidc configuration in same manner as 'n_db_nodes' and should be equal<br>number of regions | N/A | SCT_N_DB_ZERO_TOKEN_NODES
| **<a href="#user-content-zero_token_instance_type_db" name="zero_token_instance_type_db">zero_token_instance_type_db</a>** | Instance type for zero token node | i4i.large | SCT_ZERO_TOKEN_INSTANCE_TYPE_DB
| **<a href="#user-content-latency_decorator_error_thresholds" name="latency_decorator_error_thresholds">latency_decorator_error_thresholds</a>** | Error thresholds for latency decorator. Defined by dict: {<write, read, mixed>: {<default|nemesis_name>:{<metric_name>: {<rule>: <value>}}} | {'write': {'default': {'P90 write': {'fixed_limit': 5}, 'P99 write': {'fixed_limit': 10}}}, 'read': {'default': {'P90 read': {'fixed_limit': 5}, 'P99 read': {'fixed_limit': 10}}}, 'mixed': {'default': {'P90 write': {'fixed_limit': 5}, 'P90 read': {'fixed_limit': 5}, 'P99 write': {'fixed_limit': 10}, 'P99 read': {'fixed_limit': 10}}}} | SCT_LATENCY_DECORATOR_ERROR_THRESHOLDS
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ def lib = library identifier: 'sct@snapshot', retriever: legacySCM(scm)
perfRegressionParallelPipeline(
backend: "aws",
test_name: "performance_regression_test.PerformanceRegressionTest",
test_config: """["test-cases/performance/perf-regression-latency-650gb-with-nemesis.yaml", "configurations/disable_kms.yaml"]""",
test_config: """["test-cases/performance/perf-regression-latency-650gb-with-nemesis.yaml", "configurations/disable_kms.yaml", "configurations/performance/latency-decorator-error-thresholds-nemesis-ent-tablets.yaml"]""",
sub_tests: ["test_latency_write_with_nemesis", "test_latency_read_with_nemesis", "test_latency_mixed_with_nemesis"],
test_email_title: "latency during operations / tablets",
perf_extra_jobs_to_compare: "scylla-master/perf-regression/scylla-master-perf-regression-latency-650gb-with-nemesis-tablets",
Expand Down
Loading

0 comments on commit 993c44c

Please sign in to comment.