Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cilium API Rate Limiting Test #414

Open
wants to merge 92 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
dad0003
Add API rate limiting performance evaluation pipeline
mushiboy Nov 26, 2024
8eeca43
Update API rate-limiting pipeline configuration and create a new conf…
mushiboy Nov 27, 2024
aa5a43f
Add Cilium API Rate Limiter metric to existing cilium-metrics
mushiboy Nov 27, 2024
5697495
Reduce the number of repeats in the API rate limiting performance eva…
mushiboy Nov 27, 2024
c7647d1
Fix metric queries and add cilium agent metrics
mushiboy Nov 28, 2024
74bc855
Refactor API rate limiting pipeline configuration to streamline param…
mushiboy Nov 28, 2024
a4b7631
Merge API Rate Limiting test with load-config
mushiboy Nov 29, 2024
b5143d1
Update API rate limiting pipeline topology from service-churn to slo
mushiboy Dec 2, 2024
f8b1108
Add API rate limiting topology and related configuration files
mushiboy Dec 2, 2024
d51a937
Changed API rate limiting configuration
mushiboy Dec 2, 2024
b4fa439
Updated default -> DefaultParam at line 44
mushiboy Dec 2, 2024
5df6b17
Fix YAML syntax in load-config.yaml by closing the conditional block
mushiboy Dec 2, 2024
7c907f0
Fix YAML syntax in load-config.yaml by correcting the closing conditi…
mushiboy Dec 2, 2024
e96a074
Update default function calls to DefaultParam in reconcile-objects.yaml
mushiboy Dec 2, 2024
df79618
Fix YAML syntax by ensuring proper closure of conditional block in re…
mushiboy Dec 2, 2024
96f8ac3
Remove unused burst parameter from BurstLoad configuration in load-co…
mushiboy Dec 2, 2024
b4bfd9b
Fix: Rename burstLoad to qpsLoad in BurstLoad tuning set in load-conf…
mushiboy Dec 2, 2024
981e8d9
Fix: Update pods_per_node assignment for api_rate_limiting_test in sl…
mushiboy Dec 2, 2024
277c89e
Fix: Add api_rate_limiting_test and pods argument to main function in…
mushiboy Dec 2, 2024
b91b8bb
Fix: Update argument parsing for api_rate_limiting_test and pods in s…
mushiboy Dec 2, 2024
bbd0b4b
Fix: Increase cpu_per_node from 1 to 4 in api-rate-limiting.yml for i…
mushiboy Dec 2, 2024
adaa583
Fix: Reorder API rate limiting test argument in execute.yml for impro…
mushiboy Dec 2, 2024
b007ed0
Revert "Fix: Update pods_per_node assignment for api_rate_limiting_te…
mushiboy Dec 2, 2024
31183f8
Fix: Update pods_per_node assignment for api_rate_limiting_test in sl…
mushiboy Dec 3, 2024
c6549fa
Fix: Remove api_rate_limiting_test and pods arguments from argument p…
mushiboy Dec 3, 2024
54b6b1d
Fix: Correct spacing in collect.yml and execute.yml
mushiboy Dec 3, 2024
254bbf1
Fix: Add api_rate_limiting_test argument to configuration and update …
mushiboy Dec 3, 2024
88468c1
Fix: Add api_rate_limiting_test argument to main function in slo.py f…
mushiboy Dec 3, 2024
42b41f0
Fix: Reorder API_RATE_LIMITING_TEST argument in execute.yml for corre…
mushiboy Dec 3, 2024
369eaea
Fix: Add API_RATE_LIMITING_TEST default value and conditional logic i…
mushiboy Dec 3, 2024
fd4be3c
Fix: Set default value for bigDeploymentsPerNamespace to 1 and remove…
mushiboy Dec 3, 2024
7675891
Fix: Add smallDeploymentSize and smallDeploymentsPerNamespace to load…
mushiboy Dec 3, 2024
db3960b
Fix: Set CL2_SERVICE_TEST and CL2_API_RATE_LIMITING_TEST to true in s…
mushiboy Dec 3, 2024
934b660
Fix: Set CL2_SERVICE_TEST to false in slo.py for accurate test config…
mushiboy Dec 3, 2024
3d106b3
Testing with lower pod numbers
mushiboy Dec 3, 2024
8b57d2f
Fix: Update API_RATE_LIMITING_PODS_PER_NODE to 250 and set default po…
mushiboy Dec 3, 2024
884ec89
Fix: Update bigDeploymentSize variable to use pods parameter in load-…
mushiboy Dec 3, 2024
0200416
Fix: Update pods parameter in load-config.yaml to default to 0 for AP…
mushiboy Dec 3, 2024
7b4d0b9
Fix: Add pods argument to API Rate Limiting Test in slo.py and update…
mushiboy Dec 3, 2024
e597133
Fix: Update SCENARIO_VERSION to 'test' in api-rate-limiting.yml
mushiboy Dec 3, 2024
55bc92f
Update pods parameter in api-rate-limiting.yml to 134 for testing par…
mushiboy Dec 3, 2024
6c281f5
Fix: Remove default value for pods argument in slo.py and update rela…
mushiboy Dec 3, 2024
bac5d49
Fix: Set default value for pods argument in configure_clusterloader2 …
mushiboy Dec 3, 2024
c8b85d4
Fix: Remove api-rate-limiting-config.yaml and set default value for p…
mushiboy Dec 4, 2024
ad66384
Enhance Cilium API Rate Limiter metrics in cilium-measurements.yaml b…
mushiboy Dec 4, 2024
242bd49
Testing with 50 pods
mushiboy Dec 4, 2024
1933a46
Fix indentation in cilium-measurements.yaml
mushiboy Dec 4, 2024
7c45374
Fix: Cilium Measurements
mushiboy Dec 4, 2024
15027b9
Fix
mushiboy Dec 4, 2024
54dcde3
Fix: CiliumAPIRateLimiterProcessedRequestsCount
mushiboy Dec 4, 2024
6768722
Measurement fix
mushiboy Dec 4, 2024
2b67a94
Fix: CiliumAPIRateLimiterRequestsInFlight
mushiboy Dec 4, 2024
67a4188
Remove ByAction query from Cilium measurements configuration
mushiboy Dec 5, 2024
0433f51
Update Cilium measurements to include time window in queries
mushiboy Dec 5, 2024
cff58ed
Update Cilium measurements to use a 15-minute time window for queries
mushiboy Dec 5, 2024
3dd245c
Commented out requests in flight
mushiboy Dec 5, 2024
034bf6b
Comment out NetworkProgrammingLatency configuration in measurements.yaml
mushiboy Dec 5, 2024
b62d83a
Remove API rate limiting test from clusterloader2 collect script
mushiboy Dec 5, 2024
d876166
Add pods argument for API Rate Limiting Test in slo.py and update col…
mushiboy Dec 5, 2024
7a3267b
Fix
mushiboy Dec 5, 2024
369e135
Add pods argument for API Rate Limiting Test in slo.py and update col…
mushiboy Dec 5, 2024
41f7a87
Update API rate limiting scenario version and increase pod count to 200
mushiboy Dec 5, 2024
a3731f5
Testing the rate limit config
mushiboy Dec 5, 2024
a539959
Update scenario version to test 200 pods with low rate limit and adju…
mushiboy Dec 5, 2024
bf20a88
Increase CPU allocation per node to 4 for API rate limiting test
mushiboy Dec 5, 2024
c0e6155
Test-Run Changes
mushiboy Dec 5, 2024
40cc708
Increase desired nodes to 7 in API rate limiting resource validation
mushiboy Dec 6, 2024
c26ccfc
200 pods test
mushiboy Dec 6, 2024
5e3e4cd
Decrease desired nodes to 3 in API rate limiting resource validation
mushiboy Dec 6, 2024
70f380d
Update Cilium measurements to use variable time intervals for rate qu…
mushiboy Dec 9, 2024
c784b85
Refactor Cilium measurements to remove unnecessary sum in rate queries
mushiboy Dec 9, 2024
faeeb6f
Add script to set a unique Run ID before publish in clusterloader2 co…
mushiboy Dec 9, 2024
d3f1381
Update scenario version to benchmark in API rate limiting configuration
mushiboy Dec 10, 2024
bc9f3b8
Remove unnecessary blank lines in load-config.yaml
mushiboy Dec 10, 2024
2389b37
Add parameter for configurable number of pods in API rate limiting pi…
mushiboy Dec 20, 2024
d6e721f
Update scenario version to main in API rate limiting configuration
mushiboy Dec 20, 2024
d508e7b
Fix query in Cilium measurements to sum processed requests rate
mushiboy Dec 20, 2024
9f36264
Refactor Cilium measurements queries to use sum for accurate metrics …
mushiboy Dec 20, 2024
85a3a40
Refactor indentation
mushiboy Dec 20, 2024
f0d6935
Fix Cilium measurements query to use rate for accurate processed requ…
mushiboy Dec 20, 2024
27b9259
CiliumAPIRateLimiterRequestsInFlight
mushiboy Dec 20, 2024
f607a60
Refactor Cilium measurements queries to use sum for accurate metrics …
mushiboy Dec 23, 2024
0491879
Refactor Cilium measurements queries and update pod actions to use va…
mushiboy Dec 23, 2024
6bdd1bd
Update scenario version to v1 in API rate limiting pipeline
mushiboy Dec 23, 2024
fef9aad
Update scenario version to v1.1 in API rate limiting pipeline
mushiboy Dec 24, 2024
8ef35be
Refactor Cilium measurements queries
mushiboy Jan 5, 2025
aa93862
Testing API Process time metric
mushiboy Jan 5, 2025
e5f2b66
Comment out Cilium API Rate Limiter metrics in configuration
mushiboy Jan 5, 2025
09982df
Update scenario version to 'test' in API rate limiting pipeline
mushiboy Jan 5, 2025
2162965
Add 'Requests' metric to Cilium API process time measurements
mushiboy Jan 6, 2025
d5ffea7
Add 'TotalTime' metric to Cilium API process time measurements
mushiboy Jan 6, 2025
1658670
Update 'Requests' metric to use 'increase' function for accurate API …
mushiboy Jan 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions modules/python/clusterloader2/slo/config/load-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ name: load-config

# Config options for test type
{{$SERVICE_TEST := DefaultParam .CL2_SERVICE_TEST true}}
# Mugesh - API Rate Limiting Test config
{{$API_RATE_LIMITING_TEST := DefaultParam .CL2_API_RATE_LIMITING_TEST false}}


# Config options for test parameters
{{$nodesPerNamespace := DefaultParam .CL2_NODES_PER_NAMESPACE 100}}
Expand All @@ -11,6 +14,7 @@ name: load-config
{{$repeats := DefaultParam .CL2_REPEATS 1}}
{{$groupName := DefaultParam .CL2_GROUP_NAME "service-discovery"}}


# TODO(jshr-w): This should eventually use >1 namespace.
{{$namespaces := 1}}
{{$nodes := DefaultParam .CL2_NODES 1000}}
Expand All @@ -34,6 +38,13 @@ name: load-config
{{$smallDeploymentPods := SubtractInt $podsPerNamespace (MultiplyInt $bigDeploymentsPerNamespace $BIG_GROUP_SIZE)}}
{{$smallDeploymentsPerNamespace := DivideInt $smallDeploymentPods $SMALL_GROUP_SIZE}}


# Mugesh - API Rate Limiting Test - Added necessary config for API Rate Limiting Test
{{$PODS := DefaultParam .CL2_PODS 0}}




namespace:
number: {{$namespaces}}
prefix: slo
Expand All @@ -51,6 +62,10 @@ tuningSets:
- name: DeploymentDeleteQps
qpsLoad:
qps: {{$deploymentQPS}}
# Mugesh - API Rate Limiting Test - Added BurstLoad tuning set
- name: BurstLoad
qpsLoad:
qps: 1000

steps:
- name: Log - namespaces={{$namespaces}}, nodesPerNamespace={{$nodesPerNamespace}}, podsPerNode={{$podsPerNode}}, totalPods={{$totalPods}}, podsPerNamespace={{$podsPerNamespace}}, deploymentsPerNamespace={{$deploymentsPerNamespace}}, deploymentSize={{$deploymentSize}}, deploymentQPS={{$deploymentQPS}}
Expand Down Expand Up @@ -85,9 +100,15 @@ steps:
bigServicesPerNamespace: {{$bigDeploymentsPerNamespace}}
{{end}}

# Mugesh - Added a conditional statement to check if API Rate Limiting Test is enabled to proceed with the default load test configuration.
{{if not $API_RATE_LIMITING_TEST}}
- module:
path: /modules/reconcile-objects.yaml
params:




actionName: "create"
namespaces: {{$namespaces}}
tuningSet: DeploymentCreateQps
Expand Down Expand Up @@ -131,6 +152,77 @@ steps:
deploymentLabel: restart
Group: {{$groupName}}

{{end}}

# Mugesh - API Rate Limiting Test - create 200 pods, delete, restart with 250 pods

{{if $API_RATE_LIMITING_TEST}}

- module:
path: /modules/reconcile-objects.yaml
params:
actionName: "Create {{$PODS}} pods"
namespaces: {{$namespaces}}
tuningSet: BurstLoad
operationTimeout: {{$operationTimeout}}
bigDeploymentSize: {{$PODS}}
bigDeploymentsPerNamespace: 1
smallDeploymentSize: 0
smallDeploymentsPerNamespace: 0
CpuRequest: {{$latencyPodCpu}}m
MemoryRequest: {{$latencyPodMemory}}M
Group: {{$groupName}}
deploymentLabel: start

- module:
path: /modules/reconcile-objects.yaml
params:
actionName: "delete {{$PODS}} pods"
namespaces: {{$namespaces}}
tuningSet: BurstLoad
operationTimeout: {{$operationTimeout}}
bigDeploymentSize: {{$PODS}}
bigDeploymentsPerNamespace: 0
smallDeploymentSize: 0
smallDeploymentsPerNamespace: 0
deploymentLabel: delete
Group: {{$groupName}}
waitForPods: false

- module:
path: /modules/reconcile-objects.yaml
params:
actionName: "recreate {{$PODS}} pods"
namespaces: {{$namespaces}}
tuningSet: BurstLoad
operationTimeout: {{$operationTimeout}}
bigDeploymentSize: {{$PODS}}
bigDeploymentsPerNamespace: 1
smallDeploymentSize: 0
smallDeploymentsPerNamespace: 0
CpuRequest: {{$latencyPodCpu}}m
MemoryRequest: {{$latencyPodMemory}}M
Group: {{$groupName}}
deploymentLabel: restart

- module:
path: /modules/reconcile-objects.yaml
params:
actionName: "delete {{$PODS}} pods"
namespaces: {{$namespaces}}
tuningSet: BurstLoad
operationTimeout: {{$operationTimeout}}
bigDeploymentSize: {{$PODS}}
bigDeploymentsPerNamespace: 0
smallDeploymentSize: 0
smallDeploymentsPerNamespace: 0
deploymentLabel: delete
Group: {{$groupName}}

{{end}}

# Mugesh - Added conditional statement to delete services (Previously not there)
{{if $SERVICE_TEST}}
- module:
path: /modules/services.yaml
params:
Expand All @@ -140,6 +232,8 @@ steps:
bigServicesPerNamespace: 0
{{end}}

{{end}}

{{if $CILIUM_METRICS_ENABLED}}
- module:
path: /modules/cilium-measurements.yaml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,4 +150,118 @@ steps:
- name: Perc90
query: quantile(0.90, avg_over_time(cilium_operator_process_resident_memory_bytes[%v:]) / 1024 / 1024)
- name: Perc50
query: quantile(0.5, avg_over_time(cilium_operator_process_resident_memory_bytes[%v:]) / 1024 / 1024)
query: quantile(0.5, avg_over_time(cilium_operator_process_resident_memory_bytes[%v:]) / 1024 / 1024)

# Cilium API Rate Limiter Metrics
- Identifier: CiliumAgentAPIProcessTime
Method: GenericPrometheusQuery
Params:
action: {{$action}}
metricName: Cilium Agent API Process Time
metricVersion: v1
unit: s
enableViolations: true
queries:
- name: Perc99
query: histogram_quantile(0.99, sum(rate(cilium_agent_api_process_time_seconds_bucket[%v:])) by (le))
- name: Perc95
query: histogram_quantile(0.95, sum(rate(cilium_agent_api_process_time_seconds_bucket[%v:])) by (le))
- name: Perc50
query: histogram_quantile(0.50, sum(rate(cilium_agent_api_process_time_seconds_bucket[%v:])) by (le))
- name: Avg
query: avg(rate(cilium_agent_api_process_time_seconds_sum[%v:])) / avg(rate(cilium_agent_api_process_time_seconds_count[%v:]))
- name: Total
query: sum(rate(cilium_agent_api_process_time_seconds_sum[%v:]))
- name: Max
query: max(histogram_quantile(1.0, sum(rate(cilium_agent_api_process_time_seconds_bucket[%v:])) by (le)))
- name: Requests
query: sum(increase(cilium_agent_api_process_time_seconds_count[%v:]))
- name: TotalTime
query: sum(increase(cilium_agent_api_process_time_seconds_sum[%v:]))


# - Identifier: CiliumAPIRateLimiterProcessedRequests
# Method: GenericPrometheusQuery
# Params:
# action: {{$action}}
# metricName: Cilium API Rate Limiter Processed Requests
# metricVersion: v1
# unit: requests/second
# enableViolations: true
# queries:
# - name: Total
# query: sum(cilium_api_limiter_processed_requests_count)
# - name: Rate
# query: sum(rate(cilium_api_limiter_processed_requests_count[%v:]))


# - Identifier: CiliumAPIRateLimiterProcessingDuration
# Method: GenericPrometheusQuery
# Params:
# action: {{$action}}
# metricName: Cilium API Rate Limiter Processing Duration in Seconds
# metricVersion: v1
# unit: s
# enableViolations: true
# queries:
# - name: Avg
# query: avg(sum_over_time(cilium_api_limiter_processing_duration_seconds[0][%v]))
# - name: Max
# query: max_over_time(max(cilium_api_limiter_processing_duration_seconds[0])[%v])


# - Identifier: CiliumAPIRateLimiterRequestsInFlight
# Method: GenericPrometheusQuery
# Params:
# action: {{$action}}
# metricName: Cilium API Rate Limiter Requests In Flight
# metricVersion: v1
# unit: count
# enableViolations: true
# queries:
# - name: Instantaneous
# query: sum(cilium_api_limiter_requests_in_flight)
# - name: MaxOverTimeSum
# query: max_over_time(cilium_api_limiter_requests_in_flight[0][%v:])
# - name: AvgOverTimeSum
# query: avg_over_time(cilium_api_limiter_requests_in_flight[0][%v:])
# - name: SumOverTime
# query: sum_over_time(cilium_api_limiter_requests_in_flight[0][%v:])
# - name: MinOverTimeSum
# query: min_over_time(cilium_api_limiter_requests_in_flight[0][%v:])


# - Identifier: CiliumAPIRateLimiterWaitDuration
# Method: GenericPrometheusQuery
# Params:
# action: {{$action}}
# metricName: Cilium API Rate Limiter Wait Duration in Seconds
# metricVersion: v1
# unit: s
# enableViolations: true
# queries:
# - name: Avg
# query: avg(cilium_api_limiter_wait_duration_seconds{value="mean", api_call="endpoint-create"}[%v])
# - name: Max
# query: max_over_time(cilium_api_limiter_wait_duration_seconds{value="max", api_call="endpoint-create"}[%v])
# - name: Min
# query: min_over_time(cilium_api_limiter_wait_duration_seconds{value="min", api_call="endpoint-create"}[%v])


# - Identifier: CiliumAPIRateLimiterRateLimit
# Method: GenericPrometheusQuery
# Params:
# action: {{$action}}
# metricName: Cilium API Rate Limiter Rate Limit
# metricVersion: v1
# unit: count
# enableViolations: true
# queries:
# - name: Avg
# query: avg(avg_over_time(cilium_api_limiter_rate_limit{value="limit", api_call="endpoint-create"}[%v]))
# - name: Max
# query: max_over_time(cilium_api_limiter_rate_limit{value="limit", api_call="endpoint-create"}[%v])
# - name: Min
# query: min_over_time(cilium_api_limiter_rate_limit{value="limit", api_call="endpoint-create"}[%v])


Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@ steps:
action: {{$action}}
labelSelector: group = {{.group}}
threshold: {{$podStartupLatencyThreshold}}
{{if $PROMETHEUS_SCRAPE_KUBE_PROXY}}
- Identifier: NetworkProgrammingLatency
Method: NetworkProgrammingLatency
Params:
action: {{$action}}
enableViolations: {{$ENABLE_VIOLATIONS_FOR_NETWORK_PROGRAMMING_LATENCIES}}
threshold: {{$NETWORK_PROGRAMMING_LATENCY_THRESHOLD}}
{{end}}
# {{if $PROMETHEUS_SCRAPE_KUBE_PROXY}}
# - Identifier: NetworkProgrammingLatency
# Method: NetworkProgrammingLatency
# Params:
# action: {{$action}}
# enableViolations: {{$ENABLE_VIOLATIONS_FOR_NETWORK_PROGRAMMING_LATENCIES}}
# threshold: {{$NETWORK_PROGRAMMING_LATENCY_THRESHOLD}}
# {{end}}
{{if $ENABLE_IN_CLUSTER_NETWORK_LATENCY}}
- Identifier: InClusterNetworkLatency
Method: InClusterNetworkLatency
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,20 @@
{{$namespaces := .namespaces}}
{{$tuningSet := .tuningSet}}

# flag to wait for pods to be deleted
{{$waitToDelete := DefaultParam .waitForPods true}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this parameter for?


# Derivative variables
{{$is_deleting := (eq .actionName "delete")}}
{{$operationTimeout := .operationTimeout}}

# Deployments
{{$bigDeploymentSize := .bigDeploymentSize}}
{{$bigDeploymentsPerNamespace := .bigDeploymentsPerNamespace}}
{{$smallDeploymentSize := .smallDeploymentSize}}
{{$smallDeploymentsPerNamespace := .smallDeploymentsPerNamespace}}

# Mugesh - Set Default values to 0 if not provided
{{$smallDeploymentSize := DefaultParam .smallDeploymentSize 0}}
{{$smallDeploymentsPerNamespace := DefaultParam .smallDeploymentsPerNamespace 0}}

steps:
- name: Starting measurement for '{{$actionName}}'
Expand Down Expand Up @@ -59,11 +64,14 @@ steps:
Group: {{.Group}}
deploymentLabel: {{.deploymentLabel}}

# Mugesh - Wait for pods to be deleted
{{if $waitToDelete}}
- name: Waiting for '{{$actionName}}' to be completed
measurements:
- Method: WaitForControlledPodsRunning
Instances:
- Identifier: WaitForRunningDeployments
Params:
action: gather
refreshInterval: 15s
refreshInterval: 15s
{{end}}
Loading
Loading