-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFP-32759: Cilium mTLS via mTLS Agent #38
base: main
Are you sure you want to change the base?
CFP-32759: Cilium mTLS via mTLS Agent #38
Conversation
Signed-off-by: Jackie Elliott <[email protected]>
423d387
to
b8f1998
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things for extra discussion, but this is looking pretty good to me overall. Thanks @jaellio!
|
||
### MVP Scope | ||
|
||
Users can enable Cilium mTLS per workload. All traffic leaving a node from an mTLS-enabled source destined for an mTLS-enabled destination on a different node will be sent over mTLS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely the most common use-case, and certainly the one we want to support first, but conceivably, using the policy engine means that we could allow more fine-grained control of the traffic selected for mutual auth.
* Traffic between workloads on the same node will not be proxied (and therefore not be encrypted) | ||
* Cross-node traffic on a single cluster is tunneled over mTLS | ||
* Node egress traffic to non-mTLS-enabled destinations is not proxied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all of these, but I should note that we've definitely had requests from users to be able to do both the first and third items in this list. Again, I don't think that we should consider this in the initial solution, but we will get requests for it at some point in the future, so a little time ensuring we're not completely ruling these use cases out will probably be well spent.
|
||
### Enablement API | ||
|
||
For an MVP, the mTLS agent functionality will be enabled in Cilium via an environment variable in the cilium configuration - _mtlsAgent: enable_. The Cilium agent will identify the mTLS agent via a pod label - _cilium.io/redir-type=proxy_. Workloads are enabled for redirection and identified by the Cilium agent with the following label - _cilium.io/redir-type=workload_. For the MVP, the user will apply the labels per workload. A future goal is to support enabling workloads by namespace with support for exclusions. The Cilium agent will use these labels to identify which pods should be added to the BPF redirection map and to validate requests for workload configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned on our call today, I think that enablement should behave like enablement does for the current mutual auth solution:
- we have options for the existing mutual auth and encryption settings that select this behavior.
ztunnel
would be an acceptable name for both to me. - traffic is selected for mutual auth/mTLS using the existing mechanism inside using the existing
authentication.mode: required
config inside the exsiting CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy objects. - Because this existing mechanism allows the selection of traffic using labels on the Pods, it can allow opting-in on a per-Pod (or per workload) level.
- However, namespace labels are currently not supported, so we may need to leave that for a later date.
The exact function of the transparent redirection provided by the eBPF component is dependent on the networking namespace of the mTLS agent (own network namespace or root network namespace). If mTLS agent is its own network namespace, the eBPF component will redirect relevant packets from workloads and mTLS agents via L2, MAC address and interface rewrites to the node’s mTLS agent. The kernel programs could utilize a BPF map populated by the Cilium agent to determine what traffic should be redirected and to which interface. | ||
|
||
```c | ||
struct local_redirect_key { | ||
__u64 id; // Pod IP for workloads and special well-known key value for mTLS agent | ||
}; | ||
|
||
struct local_redirect_info { | ||
__u16 ifindex; // Interface index of workload or mTLS agent | ||
__u8 ifmac[6]; // MAC address of workload or mTLS agent | ||
}; | ||
``` | ||
|
||
Map entries for each mTLS-enabled workload would be created for each map on their respective nodes. A map entry would also be created for the mTLS agent present in that node. | ||
|
||
If the mTLS agent is in the root network namespace, TPROXY rules could be added directly to the root network namespace by the eBPF component. Existing transparent redirection mechanisms within Cilium (envoy and DNS transparent proxying) could be generalized to support this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that something like this will be required, but in practice, we'll also need to carefully wargame out this to ensure that this map is not susceptible to cache-poisoning attacks (at least in so far as they are relevant in the Cilium Threat Model).
|
||
### Installation | ||
|
||
The mTLS agent will be installed and managed separately from Cilium. The order of installation for the mTLS agent and Cilium is not significant. Cilium must be installed or configured with the mTLS agent functionality enabled. The mTLS agent will be deployed separately. Once the mTLS agent and cilium are configured for cilium mTLS, all existing workloads with existing _cilium.io/redir-type=workload_ labels will be enabled for redirection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If need be, it is possible to have the mTLS agent included as part of the Cilium install, similarly to how we include SPIRE in the current mutual auth solution.
To support this functionality, the Cilium agent only needs to implement a minimal part of the xDS API. Instead of building on top of existing xDS APIs, a custom API could be implemented on top of the xDS protocol. This could follow the pattern that Istio's ztunnel project uses for the custom [Workload Discovery Service](https://github.com/istio/ztunnel/blob/db0a74212c42c66b611a71a9613afb501074e257/proto/workload.proto). Supporting a custom API or utilizing Istio’s existing custom WDS would provide Cilium with a simpler and more flexible API compared with standard xDS API. An example configuration could look like the following: | ||
|
||
```markdown | ||
address: 1.2.3.4 | ||
identity: spiffe://some/identity | ||
protocol: HTTP_TUNNEL | ||
``` | ||
|
||
Alternatively, leveraging the standard set of xDS APIs would make this component usable by a larger set of control planes. | ||
|
||
The means by which the Cilium Agent obtains the workload information is irrelevant from the perspective of the mTLS agent. | ||
|
||
![alt text](images/cilium-mTLS-agent-configuration-workflow.png) | ||
|
||
To handle performance concerns related to configuration updates, an implementation similar to [Delta xDS](https://www.youtube.com/embed/LOm1ptEWx_Y?autoplay=1&enablejsapi=1&origin=https://www.bing.com&rel=0&mute=0) could be supported by the Cilium Agent. With Delta xDS, the mTLS agent would receive only the updated configuration when requesting configuration rather than receive configuration for all workloads. Similarly, the Cilium agent could primarily provide configuration to the mTLS agents via a push-based mechanism in response to enabled pod updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that, given that we are doing this, it makes way more sense to implement the ztunnel Workload API, with a separate xDS server running in Cilium.
The current xDS server in Cilium does not implement incremental xDS, but for a green-fields build like this, I agree that we absolutely should aim for it. (Doing incremental xDS requires careful design in the data-model phase of building an xDS control plane.)
* The Kubernetes [CSR](https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/) API | ||
|
||
An mTLS agent should support the following Identity schema | ||
* `spiffe://DOMAIN/identity/IDENTITY`, following the [Cilium identity scheme](https://docs.cilium.io/en/latest/network/servicemesh/mutual-authentication/mutual-authentication-example/#verify-spiffe-identities) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said on the call today, this identity schema is under review for Cilium's mutual auth solution as well, so there will definitely be further discussion to have here.
milestones. Signed-off-by: Jackie Elliott <[email protected]>
|
||
```c | ||
struct local_redirect_key { | ||
__u64 id; // Pod IP for workloads and special well-known key value for mTLS agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this map is populated only for pod enforces mtls auth has this key populated and not all pods. is that correct?
|
||
![alt text](images/cilium-mTLS-bpf-redirection-map.png) | ||
|
||
The packet will reach the mTLS agent’s pod network namespace via a veth pair.. In the mTLS agent’s network namespace of this POC, iptables TPROXY rules have been added to the mangle table’s PREROUTING chain. A [TPROXY](https://www.kernel.org/doc/html/latest/networking/tproxy.html) rule here will deliver the packet to the destination address 127.0.0.1:15001 and mark the packet. The marked packet will use a special routing table, which contains a rule that will route it to the local loopback device. The packet will thereby reach the process listening on the mTLS agent at port 15001. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it requires a special routing table? when tproxy redirects to localhost:15001, wont the packet be delivered to mtls agent?
|
||
#### Step 2: From mTLS Agent A to mTLS Agent B | ||
|
||
mTLS Agent A rewrites the destination port of the packet to the well-known port 15008 on the destination mTLS Agent B. The original destination port will be preserved in the request’s header. mTLS Agent A will establish a connection via an [HTTP CONNECT](https://httpwg.org/specs/rfc7540.html#CONNECT) tunnel over mTLS with the mTLS Agent on the same node as the destination pod after establishing a TCP connection with pod A. The packet received is then sent, through the HTTP tunnel, to the host networking stack. From here, the tunneled packet will be routed through node A’s eth0 towards node B. At node B’s eth0, an ingress-to-node BPF program will be invoked, which eventually tail calls the function that implements ingress network policies. All packets with a destination port of 15008 will be considered for redirection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the field used for preserving orginal dst port. can we also put that in cfp
|
||
### Datapath | ||
|
||
We describe an example of the datapath involved in this feature proposal. **The mTLS agent is in its own network namespace and the eBPF component is performing L2 redirection**. For brevity, we will trace a single packet on a typical cross-node communication pathway. Consider the following cluster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we list pros and cons of mtls running in host or own network namespace. i prefer host network namespace as like envoy/dnsproxy to keep it simple and consistent unless we have specific scenario that requires it to run in own namespace
|
||
![alt text](images/cilium-mTLS-agent-configuration-workflow.png) | ||
|
||
To handle performance concerns related to configuration updates, an implementation similar to [Delta xDS](https://www.youtube.com/embed/LOm1ptEWx_Y?autoplay=1&enablejsapi=1&origin=https://www.bing.com&rel=0&mute=0) could be supported by the Cilium Agent. With Delta xDS, the mTLS agent would receive only the updated configuration when requesting configuration rather than receive configuration for all workloads. Similarly, the Cilium agent could primarily provide configuration to the mTLS agents via a push-based mechanism in response to enabled pod updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push based makes sense, otherwise mtls polls periodically for updates?
|
||
To support this functionality, the Cilium agent only needs to implement a minimal part of the xDS API. Instead of building on top of existing xDS APIs, a custom API could be implemented on top of the xDS protocol. This could follow the pattern that Istio's ztunnel project uses for the custom [Workload Discovery Service](https://github.com/istio/ztunnel/blob/db0a74212c42c66b611a71a9613afb501074e257/proto/workload.proto). Supporting a custom API or utilizing Istio’s existing custom WDS would provide Cilium with a simpler and more flexible API compared with standard xDS API. An example configuration could look like the following: | ||
|
||
```markdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add more details on mtls agent request and response from cilium agent. so mtls agent requests spiffe identity for workload from cilium agent based on pod ip?
__u64 id; // Pod IP for workloads and special well-known key value for mTLS agent | ||
}; | ||
|
||
struct local_redirect_info { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need ifIndex and mac for other pods? based on my understanding, its used only for mtls pods. so instead of keeping this for every pod, should we have 2 maps. one for just storing pod ips that enforce mtls and second for mtls ip -> mac address, ifindex
We are going to take this CFP over in the near future estimated timeline is the new year. |
Cool, looking forward to it! I'm going to mark this as draft for now as there are various outstanding comments that should be discussed/agreed upon. Feel free to also bring this up for discussion in the community meeting or Slack channels when you are ready to iterate further. At that point, you can also mark the PR as "Ready for Review" again to bring attention to the PR. |
@joestringer that sounds great. I wanted to sync up with you at kubecon about this. How about I reach out to you via slack? |
sure, we can coordinate there 👍 |
Add CFP for cilium/cilium#32759