-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove unnecessary routes matching and restore old fallback next hand… #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix embedded route list stuck
This is indeed a difficult problem to solve well, it seems. @ydylla You have a lot of experience with this issue by now. What do you think of this change? Maybe this could be a good patch. |
I am not a huge fan of all the matcher state tracking since it makes it even more complex, but there is probably no other way. Also I don't think skipping matchers is compatible with the current version/behavior of the prefetch function. Because it is theoretically possible that invocations of prefetch return just 1 byte each time. So we have to check matchers multiple times. Or we must take much more care how matchers are written. So they return only false if a match is definitely impossible even when more data becomes available. Which is currently not the case for the http matcher (I think it's the only one right now). But I like that matcher authors currently don't have to worry much about it. |
Thanks @ydylla. Sorry @WeidiDeng , I'm Finally getting around to this, and while I'd need more time to fully digest what's going on, I am open to merging this and trying it out. Think it is ready to go? Thank you both for working on solving this in different ways 🙏 |
To give some more reasons why I prefer my version. This PR could cause problems when clients produce short prefetch reads (not sending all data for a matcher in one go). Because it could happen that the correct next route already was marked unmatchable based on old/consumed data that was indented for the previous route. fool-tls.json{
"admin": {
"disabled": true
},
"logging": {
"logs": {
"default": {"level":"DEBUG", "encoder": {"format":"console"}}
}
},
"apps": {
"tls": {
"certificates": {
"automate": ["localhost"]
},
"automation": {
"policies": [{
"subjects": ["localhost"],
"issuers": [{
"module": "internal"
}]
}]
}
},
"layer4": {
"servers": {
"https": {
"listen": ["0.0.0.0:10443"],
"routes": [
{
"match": [
{"proxy_protocol": {}}
],
"handle": [
{
"handler": "proxy_protocol",
"allow": ["0.0.0.0/0"]
}
]
},
{
"match": [
{"tls": {"sni": ["localhost"]}}
],
"handle": [
{"handler": "tls"},
{
"handler": "proxy",
"upstreams": [
{"dial": ["127.0.0.1:10080"]}
]
}
]
}
]
}
}
},
"http": {
"servers": {
"backend": {
"protocols": ["h1","h2","h2c"],
"listen": ["127.0.0.1:10080"],
"routes": [
{
"handle": [{
"handler": "static_response",
"status_code": "200",
"body": "Hello World\n",
"headers": {
"Content-Type": ["text/plain"]
}
}]
}
]
}
}
}
}
} and this test program fool-tls.gopackage main
import (
"crypto/tls"
"log"
"net"
"time"
)
func main() {
conn, err := net.Dial("tcp", "127.0.0.1:10443")
if err != nil {
log.Fatal(err)
}
// trigger short prefetch so proxy proto matcher reports it needs more data
// but send >= 5 bytes to fool the tls matcher
_, err = conn.Write([]byte("PROXY "))
if err != nil {
log.Fatal(err)
}
// send rest of proxy proto after sleep
time.Sleep(500 * time.Millisecond)
_, err = conn.Write([]byte("TCP4 127.0.0.1 127.0.0.1 55498 10443\r\n"))
if err != nil {
log.Fatal(err)
}
tlsConn := tls.Client(conn, &tls.Config{
ServerName: "localhost",
InsecureSkipVerify: true,
})
_, err = tlsConn.Write([]byte("GET / HTTP/1.0\r\nHost: localhost\r\n\r\n"))
if err != nil {
log.Fatal(err)
}
buf := make([]byte, 1024)
n, err := tlsConn.Read(buf)
if err != nil {
log.Fatal(err)
}
println(string(buf[:n]))
} This should print a http response with "Hello World" body but with this PR the connection is closed (EOF error). |
This error no longer happens with the latest commits. You're using the old build. Keep in mind to test using the latest commits 😅. And the reason the problem happened is not because slow clients, but with how the matcher handled matching status as if previous matcher couldn't handle the connection because of lack of data, the next handler in the chain should be given a second chance to retry,
Your logic of making matcher order less relevant is causing more problems than it solves. For example, if we swap the two handlers in your example config, the codes will still work. I'll let you work out what problems this behavior implicates before this patch is merged and I post the infinite loop exploit I mentioned 😅. In general, the code is cleaner if it's doing the right thing. Being cleaner and easier to understand is no excuse for buggy behavior. And that's why I think order matters, async matcthing is a nice thing because users can worry less about matcher order because different protocols requires different number of data, and matchers with more data requirement will block matchers with less data requirement, not because that matchers can run for a loop and cause problems for them. |
I rechecked you are right. Apparently I was using an old build yesterday, sorry for the confusion. I think I now also understand how your PR handles these situations. After a match it tries the remaining routes without any data, so all matchers return Logs from your version:
Logs from mine:
Yes I know. I still don't see the problem with this and think it is a neat feature. If there is a route that matches why not use it?
Yes please do. If it's the same issue I already mentioned in the initial post of #210 (comment) then it is fixed since yesterday by 768c9f7 |
The pastebin is already unavailable, can you please send it to [email protected] |
@mholt I don't use Slack for anything else and prefer async. Which means I will never be online anyway. But if you feel it is necessary for the decision you can invite [email protected] the alias is still valid. |
@ydylla Ah no worries. Thanks for being willing. But I totally respect it if you don't care for Slack/sync chat. 👍 We can keep it async here if you prefer. I went ahead and sent the code snippet that Weidi shared with me, to your email, so if you haven't received it already you can take a look. 👍 I have not looked closely myself yet... still been so busy. |
Thanks I have looked at the code. I do not really see a problem with it. It's not an endless loop since the nesting is bound by the for loop counter at Line 115 in 154bf6f
In practice the client will also run into the matching timeout far before the upper bound is reached. |
It's not the loop that's endless, it's resource exhaustion. Image several thousands of clients doing this thing to your server. Also, the data it sends makes no sense and should be rejected right away. Image what users will say when they find their caddy uses excessive resources due to this attack. In their config, they never specify their want this looping behavior.
With a modest timeout of 10 seconds I was able to spike the cpu usage to 20% using only one client from the normal 1%. Timeout only works when clients are well behaving. In practice if others want to attack your server, timeout alone won't help you. In practice, a successful attack combines serveral attack surfaces. You don't want this to be part of it. As you said in the comments, it's a mitigation, not a solution. |
887eb90
to
3670820
Compare
It would have been helpful if you actually said what you meant instead of vaguely hinting at some infinite loop problems multiple times (#210 (comment), #208 (comment)). On the other hand I can accept that we just don't want to support that routes could be matched multiple times. Because its confusing and often probably not what the user wants. I don't have a real use case for it myself it was just an artifact of the new implementation. I will experiment with preventing this on my latest version again. But it will probably look very similar to your solution with |
This is indeed a tough problem. I've learned a lot from both of these approaches, each of which have their pros and cons. I think both are valuable to have to compare and discuss, so thank you for working on them and pouring so much thought and energy into them! 🙏 I guess, based on the discussion, this is the PR that we'll merge. We can continue to iterate and improve on things. I think this was only possible thanks to both of you for tackling this problem! @WeidiDeng Want to rebase and then we'll merge it in? |
I tried some different approaches and ended up here: 905fa81 Both versions should now behave similar. But we can go ahead with this PR. Since I will not have much time in the next days/week and don't want to delay this any longer. |
…, this match is eligible for retest.
55fff1a
to
8259243
Compare
@mholt I did a rebase, but it completely messed up the changed files (from 10 to 45), some of them I didn't change. For now, this branch has no conflicts with the base branch. |
That happens to me sometimes too. Let's give it a shot. Thank you both so much for helping and maintaining this 🙏 ! |
Hey guys, just a heads up - I believe this PR is causing issues specifically with the // Stolen directly from https://github.com/mholt/caddy-l4/blob/afa78d72257b949486b24fa6f0351381c786a4b3/modules/l4dns/matcher.go#L66
var (
msgBuf []byte
msgBytes uint16
)
// Detect the connection protocol: TCP or UDP.
// Note: all non-TCP connections are treated as UDP, so no TCP packets could be matched while testing
// with net.Pipe() unless a valid cx.LocalAddr() response is provided using a fakeTCPConn wrapper.
if _, ok := cx.LocalAddr().(*net.TCPAddr); ok {
// Read the first 2 bytes, validate them and adjust the DNS message length
// Note: these 2 bytes represent the length of the remaining part of the packet
// as a big endian uint16 number.
err := binary.Read(cx, binary.BigEndian, &msgBytes)
if err != nil || msgBytes < dnsHeaderBytes || msgBytes > dns.MaxMsgSize {
return false, err
}
// Read the remaining bytes
msgBuf = make([]byte, msgBytes)
_, err = io.ReadFull(cx, msgBuf)
if err != nil {
return false, nil
}
// Validate the remaining connection buffer
// Note: if at least 1 byte remains, we can technically be sure, the protocol isn't DNS.
// This behaviour may be changed in the future if there are many false negative matches.
extraBuf := make([]byte, 1)
_, err = io.ReadFull(cx, extraBuf)
if err == nil {
return false, nil
}
} else {
// Read a minimum number of bytes
msgBuf = make([]byte, dnsHeaderBytes)
n, err := io.ReadAtLeast(cx, msgBuf, int(dnsHeaderBytes))
if err != nil {
return false, nil
}
// Read the remaining bytes and validate their length
var nn int
tmpBuf := make([]byte, dns.MinMsgSize)
for err == nil {
nn, err = io.ReadAtLeast(cx, tmpBuf, 1)
msgBuf = append(msgBuf, tmpBuf[:nn]...)
n += nn
}
if n > dns.MaxMsgSize {
return false, nil
}
msgBytes = uint16(n)
} Appreciate the hard work you guys are putting in! |
This issue was discovered in slack. You should just return the error whenever it's encountered instead of returning |
…ler behavior
The new non-blocking matcher breaks several things. Some of which can be avoided.
The behavior will cause unnecessary read and loop as some of the matchers already make their conclusion about the status of the match. These matches will be skipped to avoid the loop.
The fallback handler is called correctly as before.