Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFP-32759: Cilium mTLS via mTLS Agent #38

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jaellio
Copy link

@jaellio jaellio commented Jun 4, 2024

Add CFP for cilium/cilium#32759

@jaellio jaellio force-pushed the jaellio/cfp-cilium-mtls-via-mtls-agent branch from 423d387 to b8f1998 Compare June 4, 2024 18:12
Copy link
Contributor

@youngnick youngnick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things for extra discussion, but this is looking pretty good to me overall. Thanks @jaellio!


### MVP Scope

Users can enable Cilium mTLS per workload. All traffic leaving a node from an mTLS-enabled source destined for an mTLS-enabled destination on a different node will be sent over mTLS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely the most common use-case, and certainly the one we want to support first, but conceivably, using the policy engine means that we could allow more fine-grained control of the traffic selected for mutual auth.

Comment on lines +143 to +145
* Traffic between workloads on the same node will not be proxied (and therefore not be encrypted)
* Cross-node traffic on a single cluster is tunneled over mTLS
* Node egress traffic to non-mTLS-enabled destinations is not proxied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of these, but I should note that we've definitely had requests from users to be able to do both the first and third items in this list. Again, I don't think that we should consider this in the initial solution, but we will get requests for it at some point in the future, so a little time ensuring we're not completely ruling these use cases out will probably be well spent.


### Enablement API

For an MVP, the mTLS agent functionality will be enabled in Cilium via an environment variable in the cilium configuration - _mtlsAgent: enable_. The Cilium agent will identify the mTLS agent via a pod label - _cilium.io/redir-type=proxy_. Workloads are enabled for redirection and identified by the Cilium agent with the following label - _cilium.io/redir-type=workload_. For the MVP, the user will apply the labels per workload. A future goal is to support enabling workloads by namespace with support for exclusions. The Cilium agent will use these labels to identify which pods should be added to the BPF redirection map and to validate requests for workload configuration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned on our call today, I think that enablement should behave like enablement does for the current mutual auth solution:

  • we have options for the existing mutual auth and encryption settings that select this behavior. ztunnel would be an acceptable name for both to me.
  • traffic is selected for mutual auth/mTLS using the existing mechanism inside using the existing authentication.mode: required config inside the exsiting CiliumNetworkPolicy and CiliumClusterwideNetworkPolicy objects.
  • Because this existing mechanism allows the selection of traffic using labels on the Pods, it can allow opting-in on a per-Pod (or per workload) level.
  • However, namespace labels are currently not supported, so we may need to leave that for a later date.

Comment on lines +65 to +80
The exact function of the transparent redirection provided by the eBPF component is dependent on the networking namespace of the mTLS agent (own network namespace or root network namespace). If mTLS agent is its own network namespace, the eBPF component will redirect relevant packets from workloads and mTLS agents via L2, MAC address and interface rewrites to the node’s mTLS agent. The kernel programs could utilize a BPF map populated by the Cilium agent to determine what traffic should be redirected and to which interface.

```c
struct local_redirect_key {
__u64 id; // Pod IP for workloads and special well-known key value for mTLS agent
};

struct local_redirect_info {
__u16 ifindex; // Interface index of workload or mTLS agent
__u8 ifmac[6]; // MAC address of workload or mTLS agent
};
```

Map entries for each mTLS-enabled workload would be created for each map on their respective nodes. A map entry would also be created for the mTLS agent present in that node.

If the mTLS agent is in the root network namespace, TPROXY rules could be added directly to the root network namespace by the eBPF component. Existing transparent redirection mechanisms within Cilium (envoy and DNS transparent proxying) could be generalized to support this feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that something like this will be required, but in practice, we'll also need to carefully wargame out this to ensure that this map is not susceptible to cache-poisoning attacks (at least in so far as they are relevant in the Cilium Threat Model).


### Installation

The mTLS agent will be installed and managed separately from Cilium. The order of installation for the mTLS agent and Cilium is not significant. Cilium must be installed or configured with the mTLS agent functionality enabled. The mTLS agent will be deployed separately. Once the mTLS agent and cilium are configured for cilium mTLS, all existing workloads with existing _cilium.io/redir-type=workload_ labels will be enabled for redirection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If need be, it is possible to have the mTLS agent included as part of the Cilium install, similarly to how we include SPIRE in the current mutual auth solution.

Comment on lines +162 to +176
To support this functionality, the Cilium agent only needs to implement a minimal part of the xDS API. Instead of building on top of existing xDS APIs, a custom API could be implemented on top of the xDS protocol. This could follow the pattern that Istio's ztunnel project uses for the custom [Workload Discovery Service](https://github.com/istio/ztunnel/blob/db0a74212c42c66b611a71a9613afb501074e257/proto/workload.proto). Supporting a custom API or utilizing Istio’s existing custom WDS would provide Cilium with a simpler and more flexible API compared with standard xDS API. An example configuration could look like the following:

```markdown
address: 1.2.3.4
identity: spiffe://some/identity
protocol: HTTP_TUNNEL
```

Alternatively, leveraging the standard set of xDS APIs would make this component usable by a larger set of control planes.

The means by which the Cilium Agent obtains the workload information is irrelevant from the perspective of the mTLS agent.

![alt text](images/cilium-mTLS-agent-configuration-workflow.png)

To handle performance concerns related to configuration updates, an implementation similar to [Delta xDS](https://www.youtube.com/embed/LOm1ptEWx_Y?autoplay=1&enablejsapi=1&origin=https://www.bing.com&rel=0&mute=0) could be supported by the Cilium Agent. With Delta xDS, the mTLS agent would receive only the updated configuration when requesting configuration rather than receive configuration for all workloads. Similarly, the Cilium agent could primarily provide configuration to the mTLS agents via a push-based mechanism in response to enabled pod updates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that, given that we are doing this, it makes way more sense to implement the ztunnel Workload API, with a separate xDS server running in Cilium.

The current xDS server in Cilium does not implement incremental xDS, but for a green-fields build like this, I agree that we absolutely should aim for it. (Doing incremental xDS requires careful design in the data-model phase of building an xDS control plane.)

* The Kubernetes [CSR](https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/) API

An mTLS agent should support the following Identity schema
* `spiffe://DOMAIN/identity/IDENTITY`, following the [Cilium identity scheme](https://docs.cilium.io/en/latest/network/servicemesh/mutual-authentication/mutual-authentication-example/#verify-spiffe-identities)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said on the call today, this identity schema is under review for Cilium's mutual auth solution as well, so there will definitely be further discussion to have here.

milestones.

Signed-off-by: Jackie Elliott <[email protected]>

```c
struct local_redirect_key {
__u64 id; // Pod IP for workloads and special well-known key value for mTLS agent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this map is populated only for pod enforces mtls auth has this key populated and not all pods. is that correct?


![alt text](images/cilium-mTLS-bpf-redirection-map.png)

The packet will reach the mTLS agent’s pod network namespace via a veth pair.. In the mTLS agent’s network namespace of this POC, iptables TPROXY rules have been added to the mangle table’s PREROUTING chain. A [TPROXY](https://www.kernel.org/doc/html/latest/networking/tproxy.html) rule here will deliver the packet to the destination address 127.0.0.1:15001 and mark the packet. The marked packet will use a special routing table, which contains a rule that will route it to the local loopback device. The packet will thereby reach the process listening on the mTLS agent at port 15001.
Copy link

@tamilmani1989 tamilmani1989 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it requires a special routing table? when tproxy redirects to localhost:15001, wont the packet be delivered to mtls agent?


#### Step 2: From mTLS Agent A to mTLS Agent B

mTLS Agent A rewrites the destination port of the packet to the well-known port 15008 on the destination mTLS Agent B. The original destination port will be preserved in the request’s header. mTLS Agent A will establish a connection via an [HTTP CONNECT](https://httpwg.org/specs/rfc7540.html#CONNECT) tunnel over mTLS with the mTLS Agent on the same node as the destination pod after establishing a TCP connection with pod A. The packet received is then sent, through the HTTP tunnel, to the host networking stack. From here, the tunneled packet will be routed through node A’s eth0 towards node B. At node B’s eth0, an ingress-to-node BPF program will be invoked, which eventually tail calls the function that implements ingress network policies. All packets with a destination port of 15008 will be considered for redirection.
Copy link

@tamilmani1989 tamilmani1989 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the field used for preserving orginal dst port. can we also put that in cfp


### Datapath

We describe an example of the datapath involved in this feature proposal. **The mTLS agent is in its own network namespace and the eBPF component is performing L2 redirection**. For brevity, we will trace a single packet on a typical cross-node communication pathway. Consider the following cluster:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we list pros and cons of mtls running in host or own network namespace. i prefer host network namespace as like envoy/dnsproxy to keep it simple and consistent unless we have specific scenario that requires it to run in own namespace


![alt text](images/cilium-mTLS-agent-configuration-workflow.png)

To handle performance concerns related to configuration updates, an implementation similar to [Delta xDS](https://www.youtube.com/embed/LOm1ptEWx_Y?autoplay=1&enablejsapi=1&origin=https://www.bing.com&rel=0&mute=0) could be supported by the Cilium Agent. With Delta xDS, the mTLS agent would receive only the updated configuration when requesting configuration rather than receive configuration for all workloads. Similarly, the Cilium agent could primarily provide configuration to the mTLS agents via a push-based mechanism in response to enabled pod updates.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push based makes sense, otherwise mtls polls periodically for updates?


To support this functionality, the Cilium agent only needs to implement a minimal part of the xDS API. Instead of building on top of existing xDS APIs, a custom API could be implemented on top of the xDS protocol. This could follow the pattern that Istio's ztunnel project uses for the custom [Workload Discovery Service](https://github.com/istio/ztunnel/blob/db0a74212c42c66b611a71a9613afb501074e257/proto/workload.proto). Supporting a custom API or utilizing Istio’s existing custom WDS would provide Cilium with a simpler and more flexible API compared with standard xDS API. An example configuration could look like the following:

```markdown
Copy link

@tamilmani1989 tamilmani1989 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add more details on mtls agent request and response from cilium agent. so mtls agent requests spiffe identity for workload from cilium agent based on pod ip?

__u64 id; // Pod IP for workloads and special well-known key value for mTLS agent
};

struct local_redirect_info {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need ifIndex and mac for other pods? based on my understanding, its used only for mtls pods. so instead of keeping this for every pod, should we have 2 maps. one for just storing pod ips that enforce mtls and second for mtls ip -> mac address, ifindex

@MikeZappa87
Copy link

We are going to take this CFP over in the near future estimated timeline is the new year.

@joestringer
Copy link
Member

Cool, looking forward to it!

I'm going to mark this as draft for now as there are various outstanding comments that should be discussed/agreed upon. Feel free to also bring this up for discussion in the community meeting or Slack channels when you are ready to iterate further. At that point, you can also mark the PR as "Ready for Review" again to bring attention to the PR.

@joestringer joestringer marked this pull request as draft November 21, 2024 23:54
@MikeZappa87
Copy link

@joestringer that sounds great. I wanted to sync up with you at kubecon about this. How about I reach out to you via slack?

@joestringer
Copy link
Member

sure, we can coordinate there 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants