Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Comprehensive Circuit Breaker User Guide for KMesh Kernel-Native Implementation #110

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 213 additions & 0 deletions content/en/docs/userguide/circuit_breaker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
---
draft: false
linktitle: Circuit Breaker
menu:
docs:
parent: user guide
weight: 22
title: Circuit Breaker
toc: true
type: docs
---

This task shows you how to configure circuit breakers in KMesh using Fortio for load testing.

### Before you begin

- Install KMesh
Please refer [quickstart](https://kmesh.net/en/docs/setup/quickstart/) and change into ads mode


### Deploy the test applications

1. Create the test service:
```yaml
# sample-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-service
spec:
replicas: 1
selector:
matchLabels:
app: test-service
template:
metadata:
labels:
app: test-service
spec:
containers:
- name: test-service
image: nginx:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: test-service
spec:
selector:
app: test-service
ports:
- port: 80
```

2. Deploy Fortio load tester:
```yaml
# fortio.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fortio
spec:
replicas: 1
selector:
matchLabels:
app: fortio
template:
metadata:
labels:
app: fortio
spec:
containers:
- name: fortio
image: fortio/fortio
ports:
- containerPort: 8080
```

### Configure circuit breaker

Apply the circuit breaker configuration:
```yaml
# circuit-breaker.yaml
apiVersion: kmesh.net/v1alpha1
kind: CircuitBreaker
metadata:
name: test-circuit-breaker
spec:
service: test-service
rules:
- priority: HIGH
maxConnections: 10
maxPendingRequests: 5
maxRequests: 20
maxRetries: 3
```

Apply the configurations:
```bash
kubectl apply -f sample-app.yaml
kubectl apply -f fortio.yaml
kubectl apply -f circuit-breaker.yaml
```



### Monitor circuit breaker behavior

Monitor metrics:
```bash
# View Fortio metrics
kubectl logs deploy/fortio

# Check service status
kubectl get pods -w
```

### Testing Scenarios

#### 4.1 Basic Load Test
```bash
# Normal load
kubectl exec -it deploy/fortio -- \
fortio load -c 1 -qps 10 -t 30s http://test-service
```

#### 4.2 Circuit Breaker Test
```bash
# Heavy load to trigger circuit breaker
kubectl exec -it deploy/fortio -- \
fortio load -c 5 -qps 100 -t 30s http://test-service

# Verify circuit breaker status
kubectl get destinationrule test-circuit-breaker -o yaml

# Simulate service failure
kubectl scale deployment test-service --replicas=0
```
#### 4.3 Recovery Test
```bash
# Restore service
kubectl scale deployment test-service --replicas=1

# Test recovery
kubectl exec -it deploy/fortio -- \
fortio load -c 2 -qps 20 -t 30s http://test-service
```

### Analyzing Results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show the results as well?
Logs or metrics are fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiZhenCheng9527 Thank you for your response. Could you kindly clarify what you mean by "results"? Are you referring to the terminal result logs to include in the user guide?

For your convenience, I have attached the terminal log below for reference. Please let me know if this aligns with what you were looking for or if there are any additional details or metrics you'd like me to provide.

desh@pop-os:~/kt$ # Heavy load to trigger circuit breaker
kubectl exec -it deploy/fortio -- \
fortio load -c 5 -qps 100 -t 30s http://test-service

# Verify circuit breaker status
kubectl get destinationrule test-circuit-breaker -o yaml

# Simulate service failure
kubectl scale deployment test-service --replicas=0
15:15:39.852 r1 [INF] scli.go:122> Starting, command="Φορτίο", version="1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux", go-max-procs=12
Fortio 1.66.5 running at 100 queries per second, 12->12 procs, for 30s: http://test-service
15:15:39.853 r1 [INF] httprunner.go:121> Starting http test, run=0, url="http://test-service", threads=5, qps="100.0", warmup="parallel", conn-reuse=""
Starting at 100 qps with 5 thread(s) [gomax 12] for 30s : 600 calls each (total 3000)
15:16:09.858 r84 [INF] periodic.go:851> T003 ended after 30.000992951s : 600 calls. qps=19.99933805457598
15:16:09.858 r83 [INF] periodic.go:851> T002 ended after 30.000983273s : 600 calls. qps=19.99934450615098
15:16:09.858 r82 [INF] periodic.go:851> T001 ended after 30.001020274s : 600 calls. qps=19.99931984046497
15:16:09.858 r85 [INF] periodic.go:851> T004 ended after 30.001037727s : 600 calls. qps=19.99930820592978
15:16:09.858 r81 [INF] periodic.go:851> T000 ended after 30.001090238s : 600 calls. qps=19.99927320107946
Ended after 30.001156064s : 3000 calls. qps=99.996
15:16:09.858 r1 [INF] periodic.go:581> Run ended, run=0, elapsed=30001156064, calls=3000, qps=99.99614660182583
Sleep times : count 2995 avg 0.049068316 +/- 0.0003238 min 0.048044947 max 0.049896243 sum 146.959607
Aggregated Function Time : count 3000 avg 0.00034710756 +/- 9.911e-05 min 0.000126072 max 0.000813633 sum 1.04132267
# range, mid point, percentile, count
>= 0.000126072 <= 0.000813633 , 0.000469852 , 100.00, 3000
# target 50% 0.000469738
# target 75% 0.000641685
# target 90% 0.000744854
# target 99% 0.000806755
# target 99.9% 0.000812945
Error cases : no data
# Socket and IP used for each connection:
[0]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000159577 +/- 0 min 0.000159577 max 0.000159577 sum 0.000159577
[1]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000138426 +/- 0 min 0.000138426 max 0.000138426 sum 0.000138426
[2]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000130691 +/- 0 min 0.000130691 max 0.000130691 sum 0.000130691
[3]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000233499 +/- 0 min 0.000233499 max 0.000233499 sum 0.000233499
[4]   1 socket used, resolved to 10.96.230.153:80, connection timing : count 1 avg 0.000172922 +/- 0 min 0.000172922 max 0.000172922 sum 0.000172922
Connection time histogram (s) : count 5 avg 0.000167023 +/- 3.646e-05 min 0.000130691 max 0.000233499 sum 0.000835115
# range, mid point, percentile, count
>= 0.000130691 <= 0.000233499 , 0.000182095 , 100.00, 5
# target 50% 0.000169244
# target 75% 0.000201372
# target 90% 0.000220648
# target 99% 0.000232214
# target 99.9% 0.00023337
Sockets used: 5 (for perfect keepalive, would be 5)
Uniform: false, Jitter: false, Catchup allowed: true
IP addresses distribution:
10.96.230.153:80: 5
Code 200 : 3000 (100.0 %)
Response Header Sizes : count 3000 avg 238 +/- 0 min 238 max 238 sum 714000
Response Body/Total Sizes : count 3000 avg 853 +/- 0 min 853 max 853 sum 2559000
All done 3000 calls (plus 5 warmup) 0.347 ms avg, 100.0 qps
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"networking.istio.io/v1alpha3","kind":"DestinationRule","metadata":{"annotations":{},"name":"test-circuit-breaker","namespace":"default"},"spec":{"host":"test-service","trafficPolicy":{"connectionPool":{"http":{"http1MaxPendingRequests":1,"maxRequestsPerConnection":1}},"outlierDetection":{"baseEjectionTime":"30s","consecutive5xxErrors":3,"interval":"5s"}}}}
  creationTimestamp: "2025-01-23T13:22:14Z"
  generation: 1
  name: test-circuit-breaker
  namespace: default
  resourceVersion: "64439"
  uid: 07ed1da0-79c7-45a8-81b8-a7912e6d1568
spec:
  host: test-service
  trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      baseEjectionTime: 30s
      consecutive5xxErrors: 3
      interval: 5s
deployment.apps/test-service scaled
desh@pop-os:~/kt$ # Restore service
kubectl scale deployment test-service --replicas=1

# Test recovery
kubectl exec -it deploy/fortio -- \
fortio load -c 2 -qps 20 -t 30s http://test-service
deployment.apps/test-service scaled
15:16:13.095 r1 [INF] scli.go:122> Starting, command="Φορτίο", version="1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux", go-max-procs=12
Fortio 1.66.5 running at 20 queries per second, 12->12 procs, for 30s: http://test-service
15:16:13.095 r1 [INF] httprunner.go:121> Starting http test, run=0, url="http://test-service", threads=2, qps="20.0", warmup="parallel", conn-reuse=""
15:16:13.097 r52 [ERR] http_client.go:954> Unable to connect, dest={"IP":"10.96.230.153","Port":80,"Zone":""}, err="dial tcp 10.96.230.153:80: connect: connection refused", numfd=7, thread=1, run=0
15:16:13.097 r51 [ERR] http_client.go:954> Unable to connect, dest={"IP":"10.96.230.153","Port":80,"Zone":""}, err="dial tcp 10.96.230.153:80: connect: connection refused", numfd=6, thread=0, run=0
Aborting because of error -1 for http://test-service (0 bytes)
command terminated with exit code 1
desh@pop-os:~/kt$ # View test results
kubectl logs deploy/fortio
Found 2 pods, using pod/fortio-deploy-5669d4866b-rwlzj
{"ts":1737903211.076985,"level":"info","r":1,"file":"updater.go","line":50,"msg":"Configmap flag value watching on /etc/fortio"}
{"ts":1737903211.077518,"level":"crit","r":1,"file":"scli.go","line":83,"msg":"Unable to watch config/flag changes in /etc/fortio: dflag: error initializing fsnotify watcher"}
{"ts":1737903211.077641,"level":"info","r":1,"file":"scli.go","line":122,"msg":"Starting","command":"Φορτίο","version":"1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux","go-max-procs":12}
{"ts":1737903211.079770,"level":"info","r":1,"msg":"Fortio 1.66.5 tcp-echo server listening on tcp [::]:8078"}
{"ts":1737903211.079867,"level":"info","r":1,"msg":"Fortio 1.66.5 udp-echo server listening on udp [::]:8078"}
{"ts":1737903211.079908,"level":"info","r":1,"msg":"Fortio 1.66.5 grpc 'ping' server listening on tcp [::]:8079"}
{"ts":1737903211.080657,"level":"info","r":1,"msg":"Fortio 1.66.5 https redirector server listening on tcp [::]:8081"}
{"ts":1737903211.082276,"level":"info","r":1,"msg":"Fortio 1.66.5 http-echo server listening on tcp [::]:8080"}
{"ts":1737903211.082363,"level":"info","r":1,"msg":"Data directory is /var/lib/fortio"}
{"ts":1737903211.082392,"level":"info","r":1,"msg":"REST API on /fortio/rest/run, /fortio/rest/status, /fortio/rest/stop, /fortio/rest/dns"}
	 UI started - visit:
		http://localhost:8080/fortio/
	 (or any host/ip reachable on this server)
{"ts":1737903211.083216,"level":"info","r":1,"msg":"Debug endpoint on /debug, Additional Echo on /debug/echo/, Flags on /fortio/flags, and Metrics on /debug/metrics"}
{"ts":1737903211.083302,"level":"info","r":1,"file":"fortio_main.go","line":307,"msg":"All fortio 1.66.5 h1:WTJzTGOA12YWZSM5g43602lH+GOsmP3eKHXLnuRW4vs= go1.22.7 amd64 linux servers started!"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If circuit breaker is configured, then when multiple links access the service, the access will fail.
You can start by providing the results of a fortio without a circuit breaker configured.

IP addresses distribution: 10.96.230.153:80: 5 
Code 200 : 3000 (100.0 %)

Then provide the results for a fortio with a circuit breaker configured.

IP addresses distribution: 10.96.230.153:80: 5 
Code 200 : 1914 (63.8%) 
Code 503 : 1086 (36.2%)


#### 6.1 Fortio Results
```bash
# View test results
kubectl logs deploy/fortio

# Get detailed metrics
kubectl exec -it deploy/fortio -- /usr/bin/fortio report
```

#### 6.2 System Metrics
```bash
# Check pod status
kubectl get pods -w

# View circuit breaker configuration
kubectl get destinationrule test-circuit-breaker -o yaml
```

### Understanding what happened

The circuit breaker configuration:
- Limits concurrent HTTP requests
- Ejects hosts after 3 consecutive errors
- Keeps circuit open for 30 seconds
- Monitors service health every 5 seconds

When the service is overloaded:
1. Circuit breaker trips after threshold breach
2. Subsequent requests are blocked
3. Service recovers after baseEjectionTime
4. Normal traffic flow resumes

### Clean up

Remove test components:
```bash
kubectl delete -f sample-app.yaml
kubectl delete -f fortio.yaml
kubectl delete -f circuit-breaker.yaml
```

### Troubleshooting

If you encounter issues:
1. Check pod status:
```bash
kubectl get pods
kubectl describe pod <pod-name>
```

2. Check circuit breaker configuration:
```bash
kubectl get destinationrule
kubectl describe destinationrule test-circuit-breaker
```

3. View application logs:
```bash
kubectl logs deploy/test-service
kubectl logs deploy/fortio
```
Loading