rfc: add service mesh #115

3u13r · 2024-01-31T12:22:47Z

This is a first draft / research on the service mesh topic.
I propose that after everyone read this rfc and commented on it, that we get back together to clarify the remaining questions and agree on the steps forward.

burgerdev · 2024-01-31T12:57:35Z

rfc/001-service-mesh.md

+Additionally, the proxy must be configured on which ingress endpoints to enforce
+client authentication.


Either egress is missing, or this assumes that the question at the very end is answered with "no".

Yeah, egress is missing. should be something like The proxy must also be configured on which egress traffic to enforce mTLS.

burgerdev · 2024-01-31T13:39:10Z

rfc/001-service-mesh.md

+Additionally, this solutions requires that the service endpoints are configurable
+inside the deployments. Since this is the case for both emojivoto as well as
+Google's microservice demo, this is a reasonable requirement.


Instead of arguing for why such a restriction might be justified, I'd rather just point out the drawback of this solution: can't support workloads with hard-coded connection targets in a generic way (breaks with two remotes at the same port, or remote port being the same as a local listening port of the app).

While I agree that this is highly unlikely to be a problem in practice, it's unclear why to choose this solution over a more general one. Are there drawbacks missing in Routing Solution 2?

I think we have to go with a mix of both routing solutions because of the egress problem in https://github.com/edgelesssys/nunki/pull/115/files#r1472882179, right?
We'd likely configure e.g. localhost:1234 for a service address and then configure a mapping inside the proxy from localhost:1234 -> emoji-svc.namespace and also configure to require mTLS.

burgerdev · 2024-01-31T13:40:59Z

rfc/001-service-mesh.md

+For egress traffic, we configure a port and an endpoint. The proxy will listen
+locally on the port and forward all traffic to the specified endpoint.
+We set the endpoint in the application setting to `localhost:port`.


We need to support more than one port/endpoint pair, don't we?

Yes, we'd need many ports. One for each service the workload wants to reach.

burgerdev · 2024-01-31T14:01:48Z

rfc/001-service-mesh.md

+We likely re-implement parts of Envoy (see below), but have more
+flexibility regarding additional verification, e.g. should we decide to also
+use custom client certificate extensions.


Additional downside: we are now in the datapath and accountable for performance, connection drops, etc.

rfc/001-service-mesh.md

burgerdev · 2024-01-31T14:12:17Z

rfc/001-service-mesh.md

+* Do we allow workloads to talk to the internet by default? Otherwise we can
+wrap all egress traffic in mTLS.


It's not only the internet, but also non-confidential cluster endpoints, right? It does not sound like we can drop that requirement.

The chances of accidentally leaking sensitive traffic are high enough to demand an explicit opt-out by the user for some endpoints. In the transparent proxy case, we need to work with only an IP:PORT pair, which is not enough to identify the endpoint the confidential workload intended to call. Hijacking DNS traffic would be an option, etc pp, but overall it seems easier to just go with Routing Solution 1 for outbound traffic.

Yes this should be re-worded as talk to untrusted endpoints.
I think that to some extend we can and have to make restrictions under which this transparent encryption works, as we also do for TTLS in MarbleRun.

But I agree that going for the routing solution 1 is the cleanest and likely only secure option (at least for egress traffic)

burgerdev · 2024-01-31T14:18:57Z

rfc/001-service-mesh.md

+In contrast to Istio, we don't need a way to configure anything dynamically,
+since we don't have the concept of virtual services and also our certificates
+are wildcard certificates per default.


We do need at least some static configuration, though, for routing-exempt ingress ports. It's listed in RS1, so should be listed here, too.

Yes, the question is if the configuration should be a positive list (enforce mTLS on those endpoints) or a negative list (enforce for every port but not those in the list). Also another option is to route all ingress over the proxy and let the proxy configuration decide whether to enforce mTLS or not. I think moving this configuration to the proxy might be simpler since we need less points we need to configure.

burgerdev · 2024-01-31T14:22:53Z

rfc/001-service-mesh.md

+
+[1] <https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/original_dst_filter>
+[2] <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl.html#tls>
+[3] <https://istio.io/latest/docs/concepts/security/#secure-naming>


Secure naming would include the k8s api server into the TCB.

burgerdev · 2024-01-31T14:53:38Z

rfc/001-service-mesh.md

+* Which traffic do we want to secure? HTTP/S, TCP, UDP, ICMP? Is TLS even the
+correct layer for this?


I think it would be very interesting to consider feasibility of VPN tunnels instead of TLS transports, as it would allow for a much wider range of supported protocols. IPSec and OpenVPN can make use of X.509 certs, or in a pinch you implement one with ~~duck tape and WD40~~ QUIC and tuntap.

The argument against L3/4 encryption is that users are used to service meshes like Istio and they have the same restrictions. But in general I also think that a L3/4 solution is cleaner network wise.

I think another problem with VPN tunnels is that it probably needs to be implemented in the podvm (but outside of the container sandbox) to work. So we might need to customize the podvm image, which makes the approach less portable to future CoCo providers.

katexochen · 2024-02-05T15:52:53Z

Notes from meeting 2024-02-05

Which traffic do we want to secure? HTTP/S, TCP, UDP, ICMP? Is TLS even the correct layer for this?

TCP (and eventuall UDP) for now, document limitations.

Do we allow workloads to talk to the internet by default? Otherwise we can wrap all egress traffic in mTLS.

We need to allow egress traffic made by the workload.

Do we want to use any custom extensions in the client certificates in the future?

This is out of scope for now.

rfc/001-service-mesh/001-service-mesh.md

3u13r requested a review from katexochen as a code owner January 31, 2024 12:22

3u13r requested review from malt3 and burgerdev January 31, 2024 12:22

burgerdev reviewed Jan 31, 2024

View reviewed changes

3u13r requested a review from burgerdev February 6, 2024 17:09

katexochen approved these changes Feb 7, 2024

View reviewed changes

malt3 approved these changes Feb 7, 2024

View reviewed changes

burgerdev approved these changes Feb 7, 2024

View reviewed changes

rfc/001-service-mesh/001-service-mesh.md Outdated Show resolved Hide resolved

rfc/001-service-mesh/001-service-mesh.md Outdated Show resolved Hide resolved

rfc/001-service-mesh/001-service-mesh.md Outdated Show resolved Hide resolved

3u13r force-pushed the rfc/001-service-mesh branch from 71415af to 48039d6 Compare February 7, 2024 15:26

rfc: add service mesh

879d898

3u13r force-pushed the rfc/001-service-mesh branch from 48039d6 to 879d898 Compare February 7, 2024 15:42

3u13r merged commit f104bda into main Feb 7, 2024
5 checks passed

3u13r deleted the rfc/001-service-mesh branch February 7, 2024 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: add service mesh #115

rfc: add service mesh #115

3u13r commented Jan 31, 2024

burgerdev Jan 31, 2024

3u13r Feb 1, 2024

burgerdev Jan 31, 2024

3u13r Feb 1, 2024

burgerdev Jan 31, 2024

3u13r Feb 1, 2024

burgerdev Jan 31, 2024

burgerdev Jan 31, 2024

3u13r Feb 1, 2024

burgerdev Jan 31, 2024

3u13r Feb 1, 2024

burgerdev Jan 31, 2024

burgerdev Jan 31, 2024

3u13r Feb 1, 2024

malt3 Feb 5, 2024

katexochen commented Feb 5, 2024

		Additionally, the proxy must be configured on which ingress endpoints to enforce
		client authentication.

		* Do we allow workloads to talk to the internet by default? Otherwise we can
		wrap all egress traffic in mTLS.

		* Which traffic do we want to secure? HTTP/S, TCP, UDP, ICMP? Is TLS even the
		correct layer for this?

rfc: add service mesh #115

rfc: add service mesh #115

Conversation

3u13r commented Jan 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

katexochen commented Feb 5, 2024

Notes from meeting 2024-02-05