-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: add service mesh #115
Conversation
Additionally, the proxy must be configured on which ingress endpoints to enforce | ||
client authentication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either egress is missing, or this assumes that the question at the very end is answered with "no".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, egress is missing. should be something like The proxy must also be configured on which egress traffic to enforce mTLS
.
Additionally, this solutions requires that the service endpoints are configurable | ||
inside the deployments. Since this is the case for both emojivoto as well as | ||
Google's microservice demo, this is a reasonable requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of arguing for why such a restriction might be justified, I'd rather just point out the drawback of this solution: can't support workloads with hard-coded connection targets in a generic way (breaks with two remotes at the same port, or remote port being the same as a local listening port of the app).
While I agree that this is highly unlikely to be a problem in practice, it's unclear why to choose this solution over a more general one. Are there drawbacks missing in Routing Solution 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to go with a mix of both routing solutions because of the egress problem in https://github.com/edgelesssys/nunki/pull/115/files#r1472882179, right?
We'd likely configure e.g. localhost:1234 for a service address and then configure a mapping inside the proxy from localhost:1234 -> emoji-svc.namespace and also configure to require mTLS.
For egress traffic, we configure a port and an endpoint. The proxy will listen | ||
locally on the port and forward all traffic to the specified endpoint. | ||
We set the endpoint in the application setting to `localhost:port`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to support more than one port/endpoint pair, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we'd need many ports. One for each service the workload wants to reach.
We likely re-implement parts of Envoy (see below), but have more | ||
flexibility regarding additional verification, e.g. should we decide to also | ||
use custom client certificate extensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional downside: we are now in the datapath and accountable for performance, connection drops, etc.
* Do we allow workloads to talk to the internet by default? Otherwise we can | ||
wrap all egress traffic in mTLS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not only the internet, but also non-confidential cluster endpoints, right? It does not sound like we can drop that requirement.
The chances of accidentally leaking sensitive traffic are high enough to demand an explicit opt-out by the user for some endpoints. In the transparent proxy case, we need to work with only an IP:PORT pair, which is not enough to identify the endpoint the confidential workload intended to call. Hijacking DNS traffic would be an option, etc pp, but overall it seems easier to just go with Routing Solution 1 for outbound traffic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this should be re-worded as talk to untrusted endpoints
.
I think that to some extend we can and have to make restrictions under which this transparent encryption works, as we also do for TTLS in MarbleRun.
But I agree that going for the routing solution 1 is the cleanest and likely only secure option (at least for egress traffic)
In contrast to Istio, we don't need a way to configure anything dynamically, | ||
since we don't have the concept of virtual services and also our certificates | ||
are wildcard certificates per default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need at least some static configuration, though, for routing-exempt ingress ports. It's listed in RS1, so should be listed here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the question is if the configuration should be a positive list (enforce mTLS on those endpoints) or a negative list (enforce for every port but not those in the list). Also another option is to route all ingress over the proxy and let the proxy configuration decide whether to enforce mTLS or not. I think moving this configuration to the proxy might be simpler since we need less points we need to configure.
|
||
[1] <https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/original_dst_filter> | ||
[2] <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl.html#tls> | ||
[3] <https://istio.io/latest/docs/concepts/security/#secure-naming> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Secure naming would include the k8s api server into the TCB.
* Which traffic do we want to secure? HTTP/S, TCP, UDP, ICMP? Is TLS even the | ||
correct layer for this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be very interesting to consider feasibility of VPN tunnels instead of TLS transports, as it would allow for a much wider range of supported protocols. IPSec and OpenVPN can make use of X.509 certs, or in a pinch you implement one with duck tape and WD40 QUIC and tuntap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument against L3/4 encryption is that users are used to service meshes like Istio and they have the same restrictions. But in general I also think that a L3/4 solution is cleaner network wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think another problem with VPN tunnels is that it probably needs to be implemented in the podvm (but outside of the container sandbox) to work. So we might need to customize the podvm image, which makes the approach less portable to future CoCo providers.
Notes from meeting 2024-02-05
TCP (and eventuall UDP) for now, document limitations.
We need to allow egress traffic made by the workload.
This is out of scope for now. |
71415af
to
48039d6
Compare
48039d6
to
879d898
Compare
This is a first draft / research on the service mesh topic.
I propose that after everyone read this rfc and commented on it, that we get back together to clarify the remaining questions and agree on the steps forward.