-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage #193
Comments
It's hard to resolve such claim without specifics. You can start by reading this page to extract CPU profiles from Caddy. Share the profiles with us. I've attached a template below that will help make this easier and faster! This will require some effort on your part -- please understand that we will be dedicating time to fix the bug you are reporting if you can just help us understand it and reproduce it easily. This template will ask for some information you've already provided; that's OK, just fill it out the best you can. 👍 I've also included some helpful tips below the template. Feel free to let me know if you have any questions! Thank you again for your report, we look forward to resolving it! Template
Helpful tips
Example of a tutorial: Create a config file: |
1. EnvironmentUbuntu 22.04.2 LTSCaddy version:v2.7.6 h1:w0NymbG2m9PcvKWsrXO6EEkY9Ru4FJK8uQbYcev1p3A=config......
{
"listen": [
":5000",
":2087"
],
"routes": [
{
"match": [
{
"tls": {}
}
],
"handle": [
{
"handler": "tls",
"connection_policies": [
{
"client_authentication": {
"mode": "request"
}
}
]
},
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//sockets/r-{l4.tls.server_name}.socket"
]
}
]
}
]
},
{
"match": [
{
"http": []
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//sockets/r-{l4.http.host}.socket"
]
}
]
}
]
}
]}
.... profile.svg |
We need the full file, not just a cropped image. Also share the full config. |
Okay my test enviroment
OSubuntu 22.04 add this line to /etc/hosts
Haproxy config:frontend http_front
bind /tmp/r-sub.dom.com:2087.socket accept-proxy
default_backend static_backend
backend static_backend
mode http
http-request return status 200 content-type text/plain string "Hello World"
Caddy full Config{
"logging": {
"logs": {
"default": {
"level": "DEBUG"
}
}
},
"apps": {
"tls": {
"cache": {
"capacity": 2000
},
"certificates": {
"automate": [
"sub.dom.com"
]
},
"automation": {
"policies": [
{
"issuers": [
{
"module": "internal",
"lifetime": "600d"
}
]
}
]
}
},
"layer4": {
"servers": {
"ws": {
"listen": [
":5000",
":2087"
],
"routes": [
{
"match": [
{
"tls": {}
}
],
"handle": [
{
"handler": "tls",
"connection_policies": [
{
"client_authentication": {
"mode": "request"
}
}
]
},
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//tmp/r-{l4.tls.server_name}:2087.socket"
]
}
]
}
]
},
{
"match": [
{
"http": []
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//tmp/r-{l4.http.host}.socket"
]
}
]
}
]
}
]
}
}
}
}
} Test steps
with ssl:echo "GET https://sub.dom.com:2087" | ./vegeta attack -output /dev/null -rate=0 -max-workers 1000 -insecure without ssl:echo "GET http://sub.dom.com:2087" | ./vegeta attack -output /dev/null -rate=0 -max-workers 1000 profilesNOTE: .gz is fake for upload to github remove this from file nameprofile with sslprofile without ssl |
its not cropped image its svg file open in other tab |
I noticed the GitHub app has a hard time with SVGs. looks like a LOT of time is being spent in syscalls... reading and writing file descriptors (sockets) in the proxy module. I assume these are TCP packets from the client, based on the listen address? @smaznet If you remove the PROXY protocol config does the CPU usage improve? |
i'll try |
@smaznet What exactly do you mean with high CPU usage? What do you expect as normal usage? Since I was afraid this had something to do with the recent matching rewrite (see #192) I did some own benchmarking. With my favorite testing config 😅 : prefetch.json{
"admin": {
"disabled": true
},
"logging": {
"logs": {
"default": {"level":"DEBUG", "encoder": {"format":"console"}}
}
},
"apps": {
"tls": {
"certificates": {
"automate": ["localhost"]
},
"automation": {
"policies": [{
"subjects": ["localhost"],
"issuers": [{
"module": "internal"
}]
}]
}
},
"layer4": {
"servers": {
"https": {
"listen": ["0.0.0.0:10443"],
"routes": [
{
"match": [
{"proxy_protocol": {}}
],
"handle": [
{
"handler": "proxy_protocol",
"timeout": "2s",
"allow": ["127.0.0.1/32"]
}
]
},
{
"match": [
{"tls": {"sni": ["localhost"]}}
],
"handle": [
{"handler": "tls"}
]
},
{
"match": [
{"http": [{"host": ["localhost"]}]}
],
"handle": [
{
"handler": "proxy",
"upstreams": [{"dial": ["127.0.0.1:10080"]}]
}
]
}
]
}
}
},
"http": {
"servers": {
"backend": {
"protocols": ["h1","h2","h2c"],
"listen": ["127.0.0.1:10080"],
"routes": [
{
"handle": [{
"handler": "static_response",
"status_code": "200",
"body": "Hello World\n",
"headers": {
"Content-Type": ["text/plain"]
}
}]
}
]
}
}
}
}
} These are the numbers I get on my machine for the commit 145ec36 (before the matching rewrite) and f049165 is the current master: PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output 145ec3-http-1.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output 145ec3-http-2.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output 145ec3-http-3.bin
PS G:\Temp> .\vegeta.exe report .\145ec3-http-1.bin
Requests [total, rate, throughput] 9411901, 156865.33, 156855.21
Duration [total, attack, wait] 1m0s, 1m0s, 3.87ms
Latencies [min, mean, 50, 90, 95, 99, max] 1.102ms, 3.036ms, 1.614ms, 7.674ms, 10.589ms, 17.863ms, 1.149s
Bytes In [total, mean] 112942812, 12.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:9411901
Error Set:
PS G:\Temp> .\vegeta.exe report .\145ec3-http-2.bin
Requests [total, rate, throughput] 9527806, 158794.31, 158792.91
Duration [total, attack, wait] 1m0s, 1m0s, 527µs
Latencies [min, mean, 50, 90, 95, 99, max] 534.7µs, 1.688ms, 595.513µs, 4.06ms, 6.213ms, 12.558ms, 1.119s
Bytes In [total, mean] 114333672, 12.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:9527806
Error Set:
PS G:\Temp> .\vegeta.exe report .\145ec3-http-3.bin
Requests [total, rate, throughput] 9404961, 156749.46, 156749.24
Duration [total, attack, wait] 1m0s, 1m0s, 84.3µs
Latencies [min, mean, 50, 90, 95, 99, max] 84.3µs, 3.381ms, 1.777ms, 8.625ms, 11.849ms, 19.253ms, 972.624ms
Bytes In [total, mean] 112859532, 12.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:9404961
Error Set: PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output f04916-http-1.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output f04916-http-2.bin
PS G:\Temp> echo "GET http://localhost:10443" | ./vegeta.exe attack -rate=0 -max-workers 1000 -duration=60s -output f04916-http-3.bin
PS G:\Temp> .\vegeta.exe report .\f04916-http-1.bin
Requests [total, rate, throughput] 9356041, 155928.36, 155920.48
Duration [total, attack, wait] 1m0s, 1m0s, 3.035ms
Latencies [min, mean, 50, 90, 95, 99, max] 22.8µs, 1.815ms, 684.44µs, 4.331ms, 6.731ms, 12.387ms, 1.376s
Bytes In [total, mean] 112272492, 12.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:9356041
Error Set:
PS G:\Temp> .\vegeta.exe report .\f04916-http-2.bin
Requests [total, rate, throughput] 9228292, 153804.43, 153799.90
Duration [total, attack, wait] 1m0s, 1m0s, 1.767ms
Latencies [min, mean, 50, 90, 95, 99, max] 110.4µs, 3.416ms, 1.819ms, 8.547ms, 11.795ms, 19.691ms, 1.148s
Bytes In [total, mean] 110739504, 12.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:9228292
Error Set:
PS G:\Temp> .\vegeta.exe report .\f04916-http-3.bin
Requests [total, rate, throughput] 9500202, 158335.41, 158329.55
Duration [total, attack, wait] 1m0s, 1m0s, 2.22ms
Latencies [min, mean, 50, 90, 95, 99, max] 606.6µs, 1.762ms, 649.258µs, 4.178ms, 6.483ms, 12.463ms, 1.213s
Bytes In [total, mean] 114002424, 12.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:9500202
Error Set:
PS G:\Temp> So just a <1% difference in average throughput. But running caddy and vegeta on the same machine is not very sensible anyways since they fight over cpu time. Adding sending and receiving of proxy_protocol to the config did not change the numbers significantly. Running with TLS was like expected a bit slower with an average throughput of 130590.14 |
FWIW I'm also guessing normal operation. Lots of traffic = lots of CPU. 🤷♂️ By far the VAST majority of time is in syscalls anyway. Syscalls are out of Caddy's realm. |
Hey there! We have also seen this issue, and when grabbing a profile of the process, it seems a lot of time is being wasted in userspace in the layer4 module. CPU utilization went from ~1% to ~600%. We have isolated this to be a change bewteen commit c5d815d (good, pre 19:00) and ce9789f (awful, post 19:00) Please find attached cpu profiles captured (no GC, binary, 60s duration each) during regular traffic periods with Caddy at version 2.8.4 and either commit of the module. The following configuration was used: (redacted) We need to use unix sockets as an intermediate because of a use-case we have. {
"logging": {
"logs": {
"default": {
"level": "info",
"writer": {
"output": "stdout"
},
"encoder": {
"format": "json"
}
}
}
},
"apps": {
"layer4": {
"servers": {
"http": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"http": [
{
"host": [
"gitlab.REDACTED"
]
}
]
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.gitlab"
]
}
]
}
]
},
{
"match": [
{
"http": [
{
"host": [
"registry.REDACTED"
]
}
]
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.registry"
]
}
]
}
]
}
]
},
"https": {
"listen": [
":443"
],
"routes": [
{
"match": [
{
"tls": {}
}
],
"handle": [
{
"handler": "subroute",
"routes": [
{
"match": [
{
"tls": {
"sni": [
"gitlab.REDACTED"
]
}
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.gitlab"
]
}
]
}
]
},
{
"match": [
{
"tls": {
"sni": [
"registry.REDACTED"
]
}
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.registry"
]
}
]
}
]
}
]
}
]
}
]
},
"ssh": {
"listen": [
":22"
],
"routes": [
{
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"10.0.1.1:2022"
]
}
]
}
]
}
]
}
}
},
"http": {
"http_port": 80,
"https_port": 443,
"servers": {
"gitlab": {
"strict_sni_host": true,
"listen": [
"unix//var/opt/caddy/caddy.gitlab"
],
"listener_wrappers": [
{
"wrapper": "proxy_protocol",
"timeout": 0
},
{
"wrapper": "http_redirect"
},
{
"wrapper": "tls"
}
],
"metrics": {},
"routes": [
{
"match": [
{
"host": [
"gitlab.REDACTED"
]
}
],
"handle": [
{
"handler": "headers",
"response": {
"set": {
"Server": [
"REDACTED"
]
}
}
},
{
"encodings": {
"gzip": {},
"zstd": {}
},
"handler": "encode",
"prefer": [
"zstd",
"gzip"
]
},
{
"handler": "reverse_proxy",
"load_balancing": {
"selection_policy": {
"policy": "round_robin"
}
},
"upstreams": [
{
"dial": "10.0.1.1:8081"
}
]
}
],
"terminal": true
}
]
},
"registry": {
"strict_sni_host": true,
"listen": [
"unix//var/opt/caddy/caddy.registry"
],
"listener_wrappers": [
{
"wrapper": "proxy_protocol",
"timeout": 0
},
{
"wrapper": "http_redirect"
},
{
"wrapper": "tls"
}
],
"metrics": {},
"routes": [
{
"match": [
{
"host": [
"registry.REDACTED"
]
}
],
"handle": [
{
"handler": "headers",
"response": {
"set": {
"Server": [
"REDACTED"
]
}
}
},
{
"encodings": {
"gzip": {},
"zstd": {}
},
"handler": "encode",
"prefer": [
"zstd",
"gzip"
]
},
{
"handler": "reverse_proxy",
"load_balancing": {
"selection_policy": {
"policy": "round_robin"
}
},
"upstreams": [
{
"dial": "10.0.1.1:18081"
}
]
}
],
"terminal": true
}
]
}
}
},
"tls": {
"certificates": {
"load_folders": [
"/var/opt/caddy/certificates"
]
},
"automation": {
"policies": [
{
"on_demand": true,
"issuers": [
{
"email": "REDACTED",
"module": "acme",
"challenges": {
"http": {
"disabled": true
},
"tls-alpn": {
"disabled": false
}
}
}
]
}
],
"on_demand": {
"permission": {
"endpoint": "http://127.0.0.1:2223",
"module": "http"
}
}
}
}
}
}
|
Thanks for the details. So lots of time spent matching TLS in your case. Does the following PR help? Can you try this branch? #199 |
i used caddy-l4 for forwarding data to unix sockets but i have cpu issue with caddy
handler reverse_proxy with proxy_protocol enabled
The text was updated successfully, but these errors were encountered: