-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: add service mesh #115
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
# RFC 001: Service Mesh | ||
|
||
Applications inside Confidential Containers should be able to talk to each other | ||
confidentially without the need to adapt the source code of the applications. | ||
|
||
## The Problem | ||
|
||
Configuring the CA and client certificates inside the applications is tedious, | ||
since it involves developers changing their code in multiple places. | ||
This also breaks the lift and shift promise. Therefore, we can only expect the | ||
user to make slight changes to their deployments. | ||
|
||
## Solution | ||
|
||
We will deploy a sidecar container[1] which consumes the CA and client certificates. | ||
It can establish mTLS connections to other applications enrolled in the mesh | ||
by connecting to their sidecar proxies. | ||
|
||
All ingress and egress traffic should be routed over the proxy. The proxy should | ||
route packets to the original destination IP and port. | ||
Additionally, the proxy must be configured on which ingress endpoints to enforce | ||
client authentication. | ||
|
||
[1] <https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/> | ||
|
||
The problem left is how to route the applications traffic over the proxy. | ||
We propose 2 routing solutions and 2 proxy solutions. | ||
|
||
### Routing Solution 1: Manually map ingress and egress | ||
|
||
This solution shifts the `all ingress and egress traffic should be routed over the proxy` | ||
requirement to the user. | ||
|
||
Additionally, this solutions requires that the service endpoints are configurable | ||
inside the deployments. Since this is the case for both emojivoto as well as | ||
Google's microservice demo, this is a reasonable requirement. | ||
Comment on lines
+34
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of arguing for why such a restriction might be justified, I'd rather just point out the drawback of this solution: can't support workloads with hard-coded connection targets in a generic way (breaks with two remotes at the same port, or remote port being the same as a local listening port of the app). While I agree that this is highly unlikely to be a problem in practice, it's unclear why to choose this solution over a more general one. Are there drawbacks missing in Routing Solution 2? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we have to go with a mix of both routing solutions because of the egress problem in https://github.com/edgelesssys/nunki/pull/115/files#r1472882179, right? |
||
|
||
For ingress traffic, we define a port mapping from the proxy to the application. | ||
All traffic that target the proxy on that port will be forwarded to the other port. | ||
We also need to protect the application from being talked to directly via the port | ||
it exposes. To achieve that, we block all incoming traffic to the application | ||
via iptables. | ||
|
||
For egress traffic, we configure a port and an endpoint. The proxy will listen | ||
locally on the port and forward all traffic to the specified endpoint. | ||
We set the endpoint in the application setting to `localhost:port`. | ||
Comment on lines
+44
to
+46
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to support more than one port/endpoint pair, don't we? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we'd need many ports. One for each service the workload wants to reach. |
||
|
||
### Routing Solution 2: iptables based re-routing | ||
|
||
With this solution we take care of the correct routing for the user and have | ||
no requirements regarding configuration of endpoints. | ||
|
||
One example of iptables based routing is Istio [1] [2] [3]. | ||
In contrast to Istio, we don't need a way to configure anything dynamically, | ||
since we don't have the concept of virtual services and also our certificates | ||
are wildcard certificates per default. | ||
Comment on lines
+54
to
+56
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do need at least some static configuration, though, for routing-exempt ingress ports. It's listed in RS1, so should be listed here, too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the question is if the configuration should be a positive list (enforce mTLS on those endpoints) or a negative list (enforce for every port but not those in the list). Also another option is to route all ingress over the proxy and let the proxy configuration decide whether to enforce mTLS or not. I think moving this configuration to the proxy might be simpler since we need less points we need to configure. |
||
|
||
[1] <https://github.com/istio/istio/wiki/Understanding-IPTables-snapshot> | ||
|
||
[2] <https://tetrate.io/blog/traffic-types-and-iptables-rules-in-istio-sidecar-explained/> | ||
|
||
[3] <https://jimmysongio.medium.com/sidecar-injection-transparent-traffic-hijacking-and-routing-process-in-istio-explained-in-detail-d53e244e0348> | ||
|
||
### Proxy Solution 1: Custom implemented tproxy | ||
|
||
TPROXY [1] is a kernel feature to allow applications to proxy traffic without | ||
changing the actual packets e.g., when re-routing them via NAT. | ||
|
||
The proxy can implement custom user-space logic to handle traffic and easily | ||
route the traffic to the original destination (see a simple Go example [2]). | ||
|
||
We likely re-implement parts of Envoy (see below), but have more | ||
flexibility regarding additional verification, e.g. should we decide to also | ||
use custom client certificate extensions. | ||
Comment on lines
+72
to
+74
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Additional downside: we are now in the datapath and accountable for performance, connection drops, etc. |
||
|
||
[1] <https://www.kernel.org/doc/Documentation/networking/tproxy.txt> | ||
|
||
[2] <https://github.com/KatelynHaworth/go-tproxy/blob/master/example/tproxy_example.go> | ||
|
||
### Proxy Solution 2: Envoy | ||
|
||
Envoy is a L3/4/7 proxy used by Istio and Cilium. In combination with either | ||
iptables REDIRECT or TPROXY it can be used to handle TLS origination and | ||
termination [1]. | ||
The routing will be done via the original destination filter [2]. | ||
For TLS origination we wrap all outgoing connections in TLS since we | ||
cannot rely on DNS to be secure. Istio uses "secure naming" [3] to at least | ||
protect HTTP/HTTPS traffic from DNS spoofing, but r.g., raw TCP or UDP traffic | ||
is not secured. | ||
|
||
[1] <https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/security/ssl.html#tls> | ||
|
||
[2] <https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/original_dst_filter> | ||
|
||
[3] <https://istio.io/latest/docs/concepts/security/#secure-naming> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Secure naming would include the k8s api server into the TCB. |
||
|
||
## General questions | ||
|
||
* Which traffic do we want to secure? HTTP/S, TCP, UDP, ICMP? Is TLS even the | ||
correct layer for this? | ||
Comment on lines
+99
to
+100
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be very interesting to consider feasibility of VPN tunnels instead of TLS transports, as it would allow for a much wider range of supported protocols. IPSec and OpenVPN can make use of X.509 certs, or in a pinch you implement one with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The argument against L3/4 encryption is that users are used to service meshes like Istio and they have the same restrictions. But in general I also think that a L3/4 solution is cleaner network wise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think another problem with VPN tunnels is that it probably needs to be implemented in the podvm (but outside of the container sandbox) to work. So we might need to customize the podvm image, which makes the approach less portable to future CoCo providers. |
||
|
||
Since TCP service meshes are ubiquitously used, only supporting TCP for now is | ||
fine. | ||
|
||
* Do we allow workloads to talk to the internet by default? Otherwise we can | ||
wrap all egress traffic in mTLS. | ||
Comment on lines
+105
to
+106
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not only the internet, but also non-confidential cluster endpoints, right? It does not sound like we can drop that requirement. The chances of accidentally leaking sensitive traffic are high enough to demand an explicit opt-out by the user for some endpoints. In the transparent proxy case, we need to work with only an IP:PORT pair, which is not enough to identify the endpoint the confidential workload intended to call. Hijacking DNS traffic would be an option, etc pp, but overall it seems easier to just go with Routing Solution 1 for outbound traffic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this should be re-worded as But I agree that going for the routing solution 1 is the cleanest and likely only secure option (at least for egress traffic) |
||
|
||
For egress a secure by default option would be nice, but is hard to achieve. | ||
This can be implemented in a next step. | ||
|
||
* Do we want to use any custom extensions in the client certificates in the | ||
future? | ||
|
||
No, for now we don't use any certificate extensions which bind the certificate | ||
to the workload. | ||
|
||
## Way forward | ||
|
||
In Kubernetes the general architecture will be to use a sidecar container which | ||
includes an Envoy proxy and a small Go or Bash program to configure routes and | ||
setup and configure Envoy. | ||
|
||
### Step 1: Egress | ||
|
||
The routing works on layer 3. The workload owner configures the workload's | ||
service endpoints to point to a unique local IP out of the 127.0.0.1/8 CIDR. | ||
The workload owner configures the proxy to listen on each of those addresses and | ||
map it to a remote service domain. | ||
|
||
If possible, we don't want to touch the port of the packets so that we can | ||
transparently proxy all ports of a service. | ||
|
||
Note that this is not secure by default. If the user doesn't configure the | ||
endpoints in their application, traffic is send out unencrypted and without | ||
authentication. | ||
|
||
<img src="./assets/egress.svg"> | ||
|
||
### Step 2: Ingress | ||
|
||
For ingress traffic we deploy iptable rules which redirect all traffic to | ||
Envoy via tproxy iptable rules. After Envoy has terminated the TLS connection, | ||
it sends out the traffic again to the workload. The routing is similar to | ||
what Istio does [1]. | ||
|
||
The user can configure an allowlist of ports which should not be redirected to | ||
Envoy. Also traffic originating from the uid the proxy is started with, is not | ||
redirected. Since by default all traffic is routed to Envoy, the workload's | ||
ingress endpoint are secure by default. | ||
|
||
<img src="./assets/ingress.svg"> | ||
|
||
### Step 3: Secure by default egress | ||
|
||
Ideally, we also want to also have secure by default egress. But this comes with | ||
additional tradeoffs. If we assume that the workload does _NOT_ talk to any | ||
other endpoints outside of the service mesh, then we can redirect all traffic | ||
through the proxy. Since we cannot assume this to be true for all workloads, | ||
we still need the explicit configuration method described above. | ||
|
||
Since we need to allow DNS for Kubernetes service lookups, we can only redirect | ||
all TCP traffic via the proxy. | ||
|
||
### Optional: Egress capturing via DNS | ||
|
||
If we want to allow additional endpoints, we also need to touch the pod's | ||
DNS resolution. An easy way would be to resolve the allowlisted entries to | ||
either directly the correct endpoint or to a special ip of the proxy. | ||
This required the application to not implement basic DNS (over UDP) and not | ||
DNS-over-HTTPS, DNS-over-QUIC, or similar. | ||
|
||
### Outlook | ||
|
||
Especially for ingress but also for egress as described in step 3, | ||
we must ensure that the sidecar/init container runs | ||
before the workloads receives traffic. Otherwise, it might be that the iptable | ||
rules are not configured yet and the traffic is send without TLS and without | ||
client verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either egress is missing, or this assumes that the question at the very end is answered with "no".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, egress is missing. should be something like
The proxy must also be configured on which egress traffic to enforce mTLS
.