-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: route service mesh egress over l7 #143
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the rfc. I think the others should take a look to ensure our plan has no obvious flaws.
Note that this is not secure by default. If the user doesn't configure the | ||
endpoints in their application, traffic is send out unencrypted and without | ||
authentication. | ||
The egress proxing works on Layer 7. All of the workload's TCP traffic is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The egress proxing works on Layer 7. All of the workload's TCP traffic is | |
The egress proxy works on Layer 7. All of the workload's TCP traffic is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get where this is going:
redirected via tproxy iptable rules to Envoy. By default, all traffic is | ||
wrapped inside TLS. The user can provide an allowlist for endpoints to just | ||
transparently forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the allowlist also restricted to HTTP? If not, how do exemptions work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think in the first step it might not even be implemented at all. Then we can have only http endpoints but later we can also have arbitrary entries in the allow list (domain names, IPs, http endpoints)
For this, we would prefix the allowlist entry with the "layer" like domain:tcp-svc,ip:10.10.10. 10,http:google.com
Note that this is not secure by default. If the user doesn't configure the | ||
endpoints in their application, traffic is send out unencrypted and without | ||
authentication. | ||
The egress proxing works on Layer 7. All of the workload's TCP traffic is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please describe what the benefit of going to L7 is. I believe the motivation is touched in the PR description, but I don't quite understand why we'd want to choose this tradeoff.
Since TCP service meshes are ubiquitously used, only supporting TCP for now is | ||
fine. | ||
Since HTTP service meshes are ubiquitously used, only supporting HTTP for now is | ||
fine. Note that supporting HTTP also supports gRPC since it uses HTTP/2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we just swap out protocols here and the sentence remains true? I think there are enough TCP-based protocols that are neither HTTP nor encrypted, which all the other meshes support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming from a Go/K8s world I think this sentence remains true since gRPC is the default way any Go application does things in K8s. I'm not as sure about other languages though.
Can you give examples of protocols typically used in K8s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think people choose non-HTTP protocols over TCP when they want to avoid parsing, for example in latency sensitive or memory restricted applications. From the top of my head:
While gRPC may often be a good fit for this type (e.g. Arrow), I guess it's just too new to have significant market share.
Furthermore, I'd say that the lift-and-shift promise is most interesting to users with either proprietary or ancient workloads that are hard to secure - DICOM was mentioned in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that is a very good point. Then I'll close this PR and go with the steps in the original PR.
I also thought about maybe using this L7 routing proposed here for Step 3, but I think the DNS way is better since is also solves the use-cases mentioned by you.
### Step 3: Secure by default egress | ||
|
||
Ideally, we also want to also have secure by default egress. But this comes with | ||
additional tradeoffs. If we assume that the workload does _NOT_ talk to any | ||
other endpoints outside of the service mesh, then we can redirect all traffic | ||
through the proxy. Since we cannot assume this to be true for all workloads, | ||
we still need the explicit configuration method described above. | ||
|
||
Since we need to allow DNS for Kubernetes service lookups, we can only redirect | ||
all TCP traffic via the proxy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the proposal here step 3, but only for HTTP? Why not switch to step 3 right away then, and implement step 2 later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I always thought of this point that it would be implemented by hijacking DNS, but this is not mentioned on the step so this exactly fits.
Closing because of #143 (comment) |
@malt3 suggested to move the routing to L7. This has some nice properties. For one, one doesn't need to be able to edit the endpoints in the deployment. Also it's quite easy to implement this secure by default.
On the opposite side, we loose the ability to wrap arbitrary TCP connections but can only wrap HTTP traffic.
Also, we have to solve the problem described in the outlook for egress.