-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non blocking matchers & matching timeout #192
Conversation
500ms is too low for tls handshake of far away servers.
…ly consume buf Not sure if partially consumed buf after handler can happen in real life because most non terminal handlers do some kind of handshake and thus would also write to the connection. But with this it should work.
Oh wow! This is quite the change 😅 I'm impressed. It is a big change but I can see the crux of the changes are in about 2-3 places, the rest are just minor edits to accommodate different APIs. I do have some questions which I'll post in-line or as a follow-up comment. But overall I think I like where this is going and it's likely to be merged. 👍 Thank you, this is really awesome! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I might do this in chunks/iterations. Starting with the first things I notice, these are just questions I have -- overall I think this is a very thoughtful change so it's looking quite good. I just want to make sure I understand it and that we eek out the best possible design. 💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok did another quick scan -- looking better! I love that we don't have IsTerminal()
anymore. I'm glad to see you were able to make that work 😃
I still need to fully grok the "freeze/unfreeze" stuff (as opposed to recording?) but I'll do another pass soon :) Thank you!
I renamed them because "record" felt wrong since there is no recording happening during matching anymore. Instead the connection is frozen and can only read data from the prefetched buffer. Technically a matcher could still write but that would very likely result in an invalid state. We could return an error if writes are attempted during matching to prevent misuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Thanks for the explanation!
Ok I think we're close. I understand things a little better. Still a few more questions but nothing too major. Almost there!! Thanks for hanging in there with me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking good 😃 Are you able to help with it if anyone reports an issue related to this? I'll be happy to merge this and we can try it out! Excellent work.
Yes I can help bug fixing it, especially as long as I use it myself 😄 |
Awesome, thanks!! |
This PR breaks our configuration. We use caddy-l4 to forward to a Unix socket depending on the TLS Handshake, then have a secondary listened on a UNIX socket with on-demand TLS provisioning. (We are doing it this way , to be able to use different TLS provisioning policies, and others) Setting the new timeout parameters to something like '15m' does not help. The config: {
"logging": {
"logs": {
"default": {
"level": "info",
"writer": {
"output": "stdout"
},
"encoder": {
"format": "json"
}
}
}
},
"apps": {
"layer4": {
"servers": {
"http": {
"listen": [
":80"
],
"routes": [
{
"match": [
{
"http": [
{
"host": [
"gitlab.REDACTED"
]
}
]
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.gitlab"
]
}
]
}
]
},
{
"match": [
{
"http": [
{
"host": [
"registry.REDACTED"
]
}
]
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.registry"
]
}
]
}
]
}
]
},
"https": {
"listen": [
":443"
],
"routes": [
{
"match": [
{
"tls": {}
}
],
"handle": [
{
"handler": "subroute",
"routes": [
{
"match": [
{
"tls": {
"sni": [
"gitlab.REDACTED"
]
}
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.gitlab"
]
}
]
}
]
},
{
"match": [
{
"tls": {
"sni": [
"registry.REDACTED"
]
}
}
],
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"unix//var/opt/caddy/caddy.registry"
]
}
]
}
]
}
]
}
]
}
]
},
"ssh": {
"listen": [
":22"
],
"routes": [
{
"handle": [
{
"handler": "proxy",
"proxy_protocol": "v2",
"upstreams": [
{
"dial": [
"10.0.1.1:2022"
]
}
]
}
]
}
]
}
}
},
"http": {
"http_port": 80,
"https_port": 443,
"servers": {
"gitlab": {
"strict_sni_host": true,
"listen": [
"unix//var/opt/caddy/caddy.gitlab"
],
"listener_wrappers": [
{
"wrapper": "proxy_protocol",
"timeout": 0
},
{
"wrapper": "http_redirect"
},
{
"wrapper": "tls"
}
],
"metrics": {},
"routes": [
{
"match": [
{
"host": [
"gitlab.REDACTED"
]
}
],
"handle": [
{
"handler": "headers",
"response": {
"set": {
"Server": [
"REDACTED"
]
}
}
},
{
"encodings": {
"gzip": {},
"zstd": {}
},
"handler": "encode",
"prefer": [
"zstd",
"gzip"
]
},
{
"handler": "reverse_proxy",
"load_balancing": {
"selection_policy": {
"policy": "round_robin"
}
},
"upstreams": [
{
"dial": "10.0.1.1:8081"
}
]
}
],
"terminal": true
}
]
},
"registry": {
"strict_sni_host": true,
"listen": [
"unix//var/opt/caddy/caddy.registry"
],
"listener_wrappers": [
{
"wrapper": "proxy_protocol",
"timeout": 0
},
{
"wrapper": "http_redirect"
},
{
"wrapper": "tls"
}
],
"metrics": {},
"routes": [
{
"match": [
{
"host": [
"registry.REDACTED"
]
}
],
"handle": [
{
"handler": "headers",
"response": {
"set": {
"Server": [
"REDACTED"
]
}
}
},
{
"encodings": {
"gzip": {},
"zstd": {}
},
"handler": "encode",
"prefer": [
"zstd",
"gzip"
]
},
{
"handler": "reverse_proxy",
"load_balancing": {
"selection_policy": {
"policy": "round_robin"
}
},
"upstreams": [
{
"dial": "10.0.1.1:18081"
}
]
}
],
"terminal": true
}
]
}
}
},
"tls": {
"certificates": {
"load_folders": [
"/var/opt/caddy/certificates"
]
},
"automation": {
"policies": [
{
"on_demand": true,
"issuers": [
{
"email": "REDACTED",
"module": "acme"
},
{
"email": "REDACTED",
"module": "zerossl"
}
]
}
],
"on_demand": {
"ask": "http://127.0.0.1:2223"
}
}
}
}
} |
@T4cC0re Is there any particular error message? "A TLS connection is never established successfully" is not a lot to go on... |
I think I know what's happening. The subroute is blocking on its prefetch call because the root route already fetched all bytes from the client. The compiled route in the subroute handler should probably not call prefetch if there is already some data available (at least not for the first matcher loop iteration). Sorry I overlooked this, I never actually tested the subroute handler. @T4cC0re You can work around this by not using the subroute handler. You should be able to just copy your subroute "routes" to the root "routes". |
This is exactly what happens. #194 fixes it. |
Can confirm. It works now :) Thanks for the quick turnaround! |
Really appreciate your help @ydylla -- thank you! |
@ydylla One more potential bug with this commit: have you considered the case where a TLS connection is wrapped? i.e.: {
"apps": {
"layer4": {
"servers": {
"signal_proxy": {
"listen": [
":443"
],
"routes": [
{
"handle": [
{
"handler": "tls"
}
]
},
{
"match": [
{
"tls": {
"sni": [
"chat.signal.org",
"ud-chat.signal.org"
]
}
}
],
"handle": [
{
"handler": "proxy",
"upstreams": [
{
"dial": [
"chat.signal.org:443"
]
}
]
}
]
},
{
"match": [
{
"tls": {
"sni": [
"storage.signal.org",
"cdn.signal.org",
"cdn2.signal.org",
"cdn3.signal.org",
"cdsi.signal.org",
"contentproxy.signal.org",
"uptime.signal.org",
"sfu.voip.signal.org",
"svr2.signal.org",
"updates.signal.org",
"updates2.signal.org",
"backend1.svr3.signal.org",
"backend2.svr3.signal.org",
"backend3.svr3.signal.org"
]
}
}
],
"handle": [
{
"handler": "proxy",
"upstreams": [
{
"dial": [
"{l4.tls.server_name}:443"
]
}
]
}
]
}
]
}
}
},
"tls": {
"certificates": {
"automate": [
"example.com"
]
}
}
}
} A Signal proxy first terminates the "outer" layer of TLS, then with that encrypted tunnel, it sends an inner TLS connection. This config used to be able to match the ServerName (SNI) of the inner TLS connection without terminating it, but I noticed that with this change, it seems to try to terminate the inner connection. Do you think you could help fix this? It appears to be a regression. (Sorry!) |
@mholt I don't think the wrapping is the problem. It's likely my rather arbitrary choice of jumping back to the first route after a match instead of continuing. It was easier to code and seamed reasonable. If you add an "example.com" sni matcher to the first route with the tls handler it should work. |
Oh you're good! I super appreciate your contributions (I've been stretched a bit thin lately.) Let me know if I can help at all. Thanks so much 😅 |
This is done to be backwards compatible with the old matching behavior see #192 (comment)
@mholt #196 recreates the old matching behavior. package main
import (
"crypto/tls"
"log"
)
func main() {
outerConn, err := tls.Dial("tcp", "127.0.0.1:10443", &tls.Config{
ServerName: "outer",
InsecureSkipVerify: true,
})
if err != nil {
log.Fatal(err)
}
innerConn := tls.Client(outerConn, &tls.Config{
ServerName: "inner",
InsecureSkipVerify: true,
})
_, err = innerConn.Write([]byte("GET / HTTP/1.0\r\nHost: localhost\r\n\r\n"))
if err != nil {
log.Fatal(err)
}
buf := make([]byte, 1024)
n, err := innerConn.Read(buf)
if err != nil {
log.Fatal(err)
}
println(string(buf[:n]))
} |
This is done to be backwards compatible with the old matching behavior see #192 (comment)
This is done to be backwards compatible with the old matching behavior see #192 (comment)
Ah 😅 Thanks for working on that and posting your test program. I will review the patch soon!! Thank you! |
This is done to be backwards compatible with the old matching behavior see #192 (comment)
This is a better version of #72 that actually works 😄
Instead of the wrapped route chain routing/matching is now done in a loop until a timeout is reached. Each connection is prefetched without blocking. Then all routes and their matchers are tried on the available data. If nothing matched more data is prefetched and all routes are tried again. This is done until the timeout is reached. The timeout can be configured per layer4 server and defaults to 3s. I first had it set to 500ms but since the matching phase may contain tls handshakes that was too short. I saw some failed connections from far away servers where tls handshaking apparently takes about 1000ms.
If you want to test or compare both prefetching branches you can use this caddy config:
prefetch.json
The following curl commands should all work with this branch while some do not work with the old version:
I hope this is not a too big rewrite and you are still interested in merging it. I currently also test it with real browser traffic and until now it works fine.