Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: add service mesh #115

Merged
merged 1 commit into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions rfc/001-service-mesh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# RFC 001: Service Mesh

Applications inside Confidential Containers should be able to talk to each other
confidentially without the need to adapt the source code of the applications.

## The Problem

Configuring the CA and client certificates inside the applications is tedious,
since it involves developers changing their code in multiple places.
This also breaks the lift and shift promise. Therefore, we can only expect the
user to make slight changes to their deployments.

## Solution

We will deploy a sidecar container[1] which consumes the CA and client certificates.
It can establish mTLS connections to other applications enrolled in the mesh
by connecting to their sidecar proxies.

All ingress and egress traffic should be routed over the proxy. The proxy should
route packets to the original destination IP and port.
Additionally, the proxy must be configured on which ingress endpoints to enforce
client authentication.
Comment on lines +21 to +22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either egress is missing, or this assumes that the question at the very end is answered with "no".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, egress is missing. should be something like The proxy must also be configured on which egress traffic to enforce mTLS.


[1] <https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/>

The problem left is how to route the applications traffic over the proxy.
We propose 2 routing solutions and 2 proxy solutions.

### Routing Solution 1: Manually map ingress and egress

This solution shifts the `all ingress and egress traffic should be routed over the proxy`
requirement to the user.

Additionally, this solutions requires that the service endpoints are configurable
inside the deployments. Since this is the case for both emojivoto as well as
Google's microservice demo, this is a reasonable requirement.
Comment on lines +34 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of arguing for why such a restriction might be justified, I'd rather just point out the drawback of this solution: can't support workloads with hard-coded connection targets in a generic way (breaks with two remotes at the same port, or remote port being the same as a local listening port of the app).

While I agree that this is highly unlikely to be a problem in practice, it's unclear why to choose this solution over a more general one. Are there drawbacks missing in Routing Solution 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to go with a mix of both routing solutions because of the egress problem in https://github.com/edgelesssys/nunki/pull/115/files#r1472882179, right?
We'd likely configure e.g. localhost:1234 for a service address and then configure a mapping inside the proxy from localhost:1234 -> emoji-svc.namespace and also configure to require mTLS.


For ingress traffic, we define a port mapping from the proxy to the application.
All traffic that target the proxy on that port will be forwarded to the other port.
We also need to protect the application from being talked to directly via the port
it exposes. To achieve that, we block all incoming traffic to the application
via iptables.

For egress traffic, we configure a port and an endpoint. The proxy will listen
locally on the port and forward all traffic to the specified endpoint.
We set the endpoint in the application setting to `localhost:port`.
Comment on lines +44 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support more than one port/endpoint pair, don't we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we'd need many ports. One for each service the workload wants to reach.


### Routing Solution 2: iptables based re-routing

With this solution we take care of the correct routing for the user and have
no requirements regarding configuration of endpoints.

One example of iptables based routing is Istio [1] [2] [3].
In contrast to Istio, we don't need a way to configure anything dynamically,
since we don't have the concept of virtual services and also our certificates
are wildcard certificates per default.
Comment on lines +54 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need at least some static configuration, though, for routing-exempt ingress ports. It's listed in RS1, so should be listed here, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the question is if the configuration should be a positive list (enforce mTLS on those endpoints) or a negative list (enforce for every port but not those in the list). Also another option is to route all ingress over the proxy and let the proxy configuration decide whether to enforce mTLS or not. I think moving this configuration to the proxy might be simpler since we need less points we need to configure.


[1] <https://github.com/istio/istio/wiki/Understanding-IPTables-snapshot>

[2] <https://tetrate.io/blog/traffic-types-and-iptables-rules-in-istio-sidecar-explained/>

[3] <https://jimmysongio.medium.com/sidecar-injection-transparent-traffic-hijacking-and-routing-process-in-istio-explained-in-detail-d53e244e0348>

### Proxy Solution 1: Custom implemented tproxy

TPROXY [1] is a kernel feature to allow applications to proxy traffic without
changing the actual packets e.g., when re-routing them via NAT.

The proxy can implement custom user-space logic to handle traffic and easily
route the traffic to the original destination (see a simple Go example [2]).

We likely re-implement parts of Envoy (see below), but have more
flexibility regarding additional verification, e.g. should we decide to also
use custom client certificate extensions.
Comment on lines +72 to +74
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional downside: we are now in the datapath and accountable for performance, connection drops, etc.


[1] <https://www.kernel.org/doc/Documentation/networking/tproxy.txt>

[2] <https://github.com/KatelynHaworth/go-tproxy/blob/master/example/tproxy_example.go>

### Proxy Solution 2: Envoy

Envoy is a L3/4/7 proxy used by Istio and Cilium. In combination with either
iptables REDIRECT or TPROXY it can be used to handle TLS origination and
termination [1].
The routing will be done via the original destination filter [2].
For TLS origination we wrap all outgoing connections in TLS since we
cannot rely on DNS to be secure. Istio uses "secure naming" [3] to at least
protect HTTP/HTTPS traffic from DNS spoofing, but r.g., raw TCP or UDP traffic
is not secured.

[1] <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl.html#tls>

[2] <https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/original_dst_filter>

[3] <https://istio.io/latest/docs/concepts/security/#secure-naming>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secure naming would include the k8s api server into the TCB.


## General questions

* Which traffic do we want to secure? HTTP/S, TCP, UDP, ICMP? Is TLS even the
correct layer for this?
Comment on lines +99 to +100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be very interesting to consider feasibility of VPN tunnels instead of TLS transports, as it would allow for a much wider range of supported protocols. IPSec and OpenVPN can make use of X.509 certs, or in a pinch you implement one with duck tape and WD40 QUIC and tuntap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument against L3/4 encryption is that users are used to service meshes like Istio and they have the same restrictions. But in general I also think that a L3/4 solution is cleaner network wise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another problem with VPN tunnels is that it probably needs to be implemented in the podvm (but outside of the container sandbox) to work. So we might need to customize the podvm image, which makes the approach less portable to future CoCo providers.


Since TCP service meshes are ubiquitously used, only supporting TCP for now is
fine.

* Do we allow workloads to talk to the internet by default? Otherwise we can
wrap all egress traffic in mTLS.
Comment on lines +105 to +106
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only the internet, but also non-confidential cluster endpoints, right? It does not sound like we can drop that requirement.

The chances of accidentally leaking sensitive traffic are high enough to demand an explicit opt-out by the user for some endpoints. In the transparent proxy case, we need to work with only an IP:PORT pair, which is not enough to identify the endpoint the confidential workload intended to call. Hijacking DNS traffic would be an option, etc pp, but overall it seems easier to just go with Routing Solution 1 for outbound traffic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should be re-worded as talk to untrusted endpoints.
I think that to some extend we can and have to make restrictions under which this transparent encryption works, as we also do for TTLS in MarbleRun.

But I agree that going for the routing solution 1 is the cleanest and likely only secure option (at least for egress traffic)


For egress a secure by default option would be nice, but is hard to achieve.
This can be implemented in a next step.

* Do we want to use any custom extensions in the client certificates in the
future?

No, for now we don't use any certificate extensions which bind the certificate
to the workload.

## Way forward

In Kubernetes the general architecture will be to use a sidecar container which
includes an Envoy proxy and a small Go or Bash program to configure routes and
setup and configure Envoy.

### Step 1: Egress

The routing works on layer 3. The workload owner configures the workload's
service endpoints to point to a unique local IP out of the 127.0.0.1/8 CIDR.
The workload owner configures the proxy to listen on each of those addresses and
map it to a remote service domain.

If possible, we don't want to touch the port of the packets so that we can
transparently proxy all ports of a service.

Note that this is not secure by default. If the user doesn't configure the
endpoints in their application, traffic is send out unencrypted and without
authentication.

<img src="./assets/egress.svg">

### Step 2: Ingress

For ingress traffic we deploy iptable rules which redirect all traffic to
Envoy via tproxy iptable rules. After Envoy has terminated the TLS connection,
it sends out the traffic again to the workload. The routing is similar to
what Istio does [1].

The user can configure an allowlist of ports which should not be redirected to
Envoy. Also traffic originating from the uid the proxy is started with, is not
redirected. Since by default all traffic is routed to Envoy, the workload's
ingress endpoint are secure by default.

<img src="./assets/ingress.svg">

### Step 3: Secure by default egress

Ideally, we also want to also have secure by default egress. But this comes with
additional tradeoffs. If we assume that the workload does _NOT_ talk to any
other endpoints outside of the service mesh, then we can redirect all traffic
through the proxy. Since we cannot assume this to be true for all workloads,
we still need the explicit configuration method described above.

Since we need to allow DNS for Kubernetes service lookups, we can only redirect
all TCP traffic via the proxy.

### Optional: Egress capturing via DNS

If we want to allow additional endpoints, we also need to touch the pod's
DNS resolution. An easy way would be to resolve the allowlisted entries to
either directly the correct endpoint or to a special ip of the proxy.
This required the application to not implement basic DNS (over UDP) and not
DNS-over-HTTPS, DNS-over-QUIC, or similar.

### Outlook

Especially for ingress but also for egress as described in step 3,
we must ensure that the sidecar/init container runs
before the workloads receives traffic. Otherwise, it might be that the iptable
rules are not configured yet and the traffic is send without TLS and without
client verification.
Loading
Loading