Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: route service mesh egress over l7 #143

Closed
wants to merge 1 commit into from

Conversation

3u13r
Copy link
Member

@3u13r 3u13r commented Feb 9, 2024

@malt3 suggested to move the routing to L7. This has some nice properties. For one, one doesn't need to be able to edit the endpoints in the deployment. Also it's quite easy to implement this secure by default.
On the opposite side, we loose the ability to wrap arbitrary TCP connections but can only wrap HTTP traffic.

Also, we have to solve the problem described in the outlook for egress.

@3u13r 3u13r requested a review from katexochen as a code owner February 9, 2024 12:35
@3u13r 3u13r requested review from malt3 and burgerdev February 9, 2024 12:35
Copy link
Contributor

@malt3 malt3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the rfc. I think the others should take a look to ensure our plan has no obvious flaws.

Note that this is not secure by default. If the user doesn't configure the
endpoints in their application, traffic is send out unencrypted and without
authentication.
The egress proxing works on Layer 7. All of the workload's TCP traffic is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The egress proxing works on Layer 7. All of the workload's TCP traffic is
The egress proxy works on Layer 7. All of the workload's TCP traffic is

Copy link
Contributor

@burgerdev burgerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get where this is going:

Comment on lines +126 to +128
redirected via tproxy iptable rules to Envoy. By default, all traffic is
wrapped inside TLS. The user can provide an allowlist for endpoints to just
transparently forward.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the allowlist also restricted to HTTP? If not, how do exemptions work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think in the first step it might not even be implemented at all. Then we can have only http endpoints but later we can also have arbitrary entries in the allow list (domain names, IPs, http endpoints)

For this, we would prefix the allowlist entry with the "layer" like domain:tcp-svc,ip:10.10.10. 10,http:google.com

Note that this is not secure by default. If the user doesn't configure the
endpoints in their application, traffic is send out unencrypted and without
authentication.
The egress proxing works on Layer 7. All of the workload's TCP traffic is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe what the benefit of going to L7 is. I believe the motivation is touched in the PR description, but I don't quite understand why we'd want to choose this tradeoff.

Comment on lines -102 to +103
Since TCP service meshes are ubiquitously used, only supporting TCP for now is
fine.
Since HTTP service meshes are ubiquitously used, only supporting HTTP for now is
fine. Note that supporting HTTP also supports gRPC since it uses HTTP/2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we just swap out protocols here and the sentence remains true? I think there are enough TCP-based protocols that are neither HTTP nor encrypted, which all the other meshes support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming from a Go/K8s world I think this sentence remains true since gRPC is the default way any Go application does things in K8s. I'm not as sure about other languages though.
Can you give examples of protocols typically used in K8s?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people choose non-HTTP protocols over TCP when they want to avoid parsing, for example in latency sensitive or memory restricted applications. From the top of my head:

While gRPC may often be a good fit for this type (e.g. Arrow), I guess it's just too new to have significant market share.

Furthermore, I'd say that the lift-and-shift promise is most interesting to users with either proprietary or ancient workloads that are hard to secure - DICOM was mentioned in the past.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that is a very good point. Then I'll close this PR and go with the steps in the original PR.
I also thought about maybe using this L7 routing proposed here for Step 3, but I think the DNS way is better since is also solves the use-cases mentioned by you.

Comment on lines -153 to -162
### Step 3: Secure by default egress

Ideally, we also want to also have secure by default egress. But this comes with
additional tradeoffs. If we assume that the workload does _NOT_ talk to any
other endpoints outside of the service mesh, then we can redirect all traffic
through the proxy. Since we cannot assume this to be true for all workloads,
we still need the explicit configuration method described above.

Since we need to allow DNS for Kubernetes service lookups, we can only redirect
all TCP traffic via the proxy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the proposal here step 3, but only for HTTP? Why not switch to step 3 right away then, and implement step 2 later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I always thought of this point that it would be implemented by hijacking DNS, but this is not mentioned on the step so this exactly fits.

@3u13r
Copy link
Member Author

3u13r commented Feb 12, 2024

Closing because of #143 (comment)

@3u13r 3u13r closed this Feb 12, 2024
@katexochen katexochen deleted the rfc/001/move-egress-to-l7 branch July 1, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants