Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFP-30984 : Add CFP for DNS proxy HA - V2 #54

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions cilium/CFP-30984-dns-proxy-ha-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# CFP-30984: toFQDN DNS proxy HA

**SIG: SIG-POLICY**

**Begin Design Discussion:** 2024-08-19

**Cilium Release:** 1.17

**Authors:** Hemanth Malla <[email protected]>, Vipul Singh <[email protected]>, Tamilmani Manoharan <[email protected]>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Administrative point:

If we think we agree on the basics here, but would like more of the design to be fleshed out:

Suggested change
Status: Provisional

(If provisional, please highlight anything that still needs to be expanded with the text "UNRESOLVED: description of unresolved design point").

Or alternatively if we think the design is satisfactorily complete:

Suggested change
Status: Implementable

See status in the readme for more details. https://github.com/cilium/design-cfps?tab=readme-ov-file#status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the point I made at the end of the discussion with @cilium/sig-policy today. Whatever we agree upon, we should be able to merge in. If there are points left to iterate on, we can highlight those points. If we've resolved the points of difference we can mark it implementable. Goal is to make iteration easier so we don't have to overthink an entire design that is subject to change during implementation, but rather we can more incrementally change the design in-tree as we discuss, debate, implement and learn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a status field once we close out the open discussion items.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like we haven't documented Provisional in the readme yet ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like we haven't documented Provisional in the readme yet ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, provisional is just under discussion in #48 but not documented yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark this as implementable.

## Summary

Cilium agent uses a proxy to intercept all DNS queries and obtain necessary information to enforce FQDN network policies. However, the lifecycle of this proxy is coupled with the cilium agent. When an endpoint has a toFQDN network policy in place, cilium installs a redirect to capture all DNS traffic. So, when the agent is unavailable, all DNS requests time out, including when DNS name to IP address mappings are already in place for this name. DNS policy unload on shutdown can be enabled on the agent, but it works only when the L7 policy is set to * and the agent is shutdown gracefully.

This CFP introduces a standalone DNS proxy that can run alongside the Cilium agent, which should eliminate hard dependency for names that already have policy map entries in place.

## Motivation

Users rely on toFQDN policies to enforce network policies against traffic to destinations outside the cluster, typically to blob storage / other services on the internet. Rolling out the Cilium agent should not result in packet drops. Introducing a high availability (HA) mode will allow for increased adoption of toFQDN network policies in critical environments.

## Goals

* Introduce a streaming gRPC API for exchanging FQDN policy related information.
* Introduce a standalone DNS proxy (SDP) that binds on the same port as built-in proxy with SO_REUSEPORT.
* Enforce L7 DNS policy via SDP.
* When an endpoint's DNS traffic is selected by an L7 policy, DNS requests and responses will be forwarded to their destinations via SDP even if cilium-agent is not running. So, clients re-resolving DNS to establish new connections will not be blocked anymore if the IP addresses from the new resolution are unchanged. Note that the L3/L4 policy for the resolved names should have already been plumbed when the agent was running.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a goal, this is implementation detail. Should this be dropped or moved under the proposal section?


## Non-Goals

* Updating new DNS <> IP mappings when the agent is down is not in scope for this CFP. Solving for this scenario would likely be more involved and may require creation of a dedicated set of bpf maps. This will likely come with a performance penalty of having to perform multiple lookups. We will explore solutions for this in a future CFP.

## Proposal

### Overview

There are two parts to enforcing toFQDN network policy. L3/L4 policy enforcement against IP addresses resolved from an FQDN and policy enforcement on DNS requests (L7 DNS policy). To enforce L3/L4 policy, per endpoint policy bpf maps need to be updated. We'd like to avoid multiple processes writing entries to policy maps, so the standalone DNS proxy (SDP) needs a mechanism to notify agent of newly resolved FQDN <> IP address mappings. This CFP proposes exposing a new gRPC streaming API from the cilium agent. Since the connection is bi-directional, the cilium agent can reuse the same connection to notify the SDP of L7 DNS policy changes.

Additionally, SDP needs to translate the IP address to cilium identity to enforce the policy. This CFP proposes to retrieve the identity mapping from the cilium_ipcache BPF map. If / when other another abstraction is introduced for getting IP<->Identity mappings (for ex, in L7 proxy), this implementation can use that abstraction. This CFP will focus on the contract between SDP and Cilium agent to exchange minimum information for implementing the high availability mode.

In addition to existing unix domain socket (UDS) opened by the agent to host HTTP APIs, we'll need a new UDS for the gRPC streaming service with similar permissions.

### RPC Methods

Method : UpdateMappings (Invoked from SDP to agent)

_rpc UpdatesMappings(FQDNMapping) returns (UpdatesMappingsResult){}_
Request :
```
message FQDNMapping {
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
string FQDN = 1; // DNS Name of the request made by the client
repeated bytes IPS = 2; // Resolved IP addresses
uint32 TTL = 3;
uint64 source_identity = 4; // Identity of the client making the DNS request
bytes source_ip = 5; // IP address of the client making the DNS request
int dns_response_code = 6;
}
```
Response :
```
message UpdatesMappingsResult {
bool success = 1;
}
```

Method : SubscribeToDNSPolicies ( Invoked from agent to SDP via bi-directional stream )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoked from agent to SDP

Should probably be "invoked from SDP to agent" right?

Otherwise I think this doesn't match how the RPC is specified, and also would mean that the SDP exposes a gRPC API itself, which seems unnecessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct. its invoked by SDP to agent and that opens up a stream which used by Cilium Agent to send updates


_rpc SubscribeToDNSPolicies(stream DNSPoliciesResult) returns (stream DNSPolicies){}_
Request :
```
message DNSServer {
uint64 dns_server_identity = 1; // Identity of destination DNS server
uint32 dns_server_port = 2;
uint32 dns_server_proto = 3;
}

message DNSPolicy {
uint64 source_identity = 1; // Identity of the workload this L7 DNS policy should apply to
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
repeated string dns_pattern = 2; // Allowed DNS pattern this identity is allowed to resolve.
repeated DNSServer dns_servers = 3;
}

message DNSPolicies {
repeated DNSPolicy egress_l7_dns_policy = 1;
string request_id = 2; // Random UUID based identifier which will be referenced in ACKs
}

```

*Note: `dns_pattern` follows the same convention used in CNPs. See https://docs.cilium.io/en/stable/security/policy/language/#dns-based for more details*

*Note: `DNSPolicies` is a snapshot of the latest known policy information for all endpoints on the host. Sending a snapshot allows for dealing with deletions automatically*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend to send a full snapshot each time? It seems possible to match k8s with a ListAndWatch approach to avoid sending a full snapshot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we intend to send the full snapshot of the policies applied. Not familiar with the ListAndWatch approach, just curious if that will require permissions for SDP to read the CRD ? Then SDP will map the policy rules (labels<>identity) too ?


SDP to CA message format :
```
message DNSPoliciesResult {
bool success = 1;
string request_id = 2;
}
```

### Load balancing

SDP and agent's DNS proxy will run on the same port using SO_REUSEPORT. By default, kernel will use round robin algorithm to distribute load evenly between all sockets in the reuseport group. If cilium agent's DNS proxy goes down, kernel will automatically switch all traffic to SDP and vice versa. In the future, we can consider using a custom bpf program to make SDP only act as a hot standby. See (PoC)[https://github.com/hemanthmalla/reuseport_ebpf/blob/main/bpf/reuseport_select.c] / eBPF summit 2023 talk for more details.


### High Level Information Flow

* Agent starts up with gRPC streaming service (only after resources are synced from k8s and ipcache bpf map is populated)
* SDP starts up.
* Connects to gRPC service, retrying periodically until success.
* Agent sends current snapshot for L7 DNS Policy enforcement via UpdatesDNSRules to SDP.
* On policy recomputation, agent invokes UpdatesDNSRules.
* On DNS request from the client, DNS request redirects to DNS proxy port.
* Kernel round robin load balances between SDP and built in proxy.
* Assuming SDP gets the request, SDP enforces L7 DNS policy.
* Lookup identity based on IP address via bpf map.
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
* Check against policy snapshot if this identity is allowed to resolve the current DNS name and is allowed to talk to DNS server target identity (also needs lookup).
* Make upstream DNS request from SDP.
* On response, SDP invokes UpdatesMappings() to notify agent of new mappings.
* Release DNS response after success from UpdatesMappings() / timeout.

### Handling SDP <> Agent re-connections

* When the agent is unavailable, SDP will periodically attempt to re-connect to the streaming service. Any FQDN<>IP mappings resolved when the agent is down will be cached in SDP and `UpdatesMappings` will be retried after establishing the connection.
* A new bpf map for ipcache is populated on agent startup, so SDP needs to re-open the ipcache bpf map when the connection is re-established. See https://github.com/cilium/cilium/pull/32864 for similar handling in envoy.
* On a new connection from SDP, the agent will invoke `UpdatesDNSRules` to notify SDP of all L7 DNS policy rules.

* SDP will not listen on the DNS proxy port until a connection is established with cilium agent and initial L7 DNS policy rules are received. Meanwhile, built-in DNS proxy will continue to serve requests. SDP relies on cilium agent for initial bootstrap. In future, we could make SDP retrieve initial policy information from other sources, but this is not in scope for this CFP.
joestringer marked this conversation as resolved.
Show resolved Hide resolved

### Handling Upgrades
Copy link
Member

@joestringer joestringer Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't feel that this section adequately explains upgrades. The text is capturing some of the key discussions we have had though, which helps. What I have in mind is something a bit more like a test plan that outlines the bringing up / shutting down of different components and exactly what's expected in terms of forwarding state around.

Overall though I don't feel that continuing this path by debating back and forth in this design doc is really helping drive this upgrades discussion. I think that at this point it makes sense to just accept that some aspects of this CFP are underspecified / not agreed upon, but the overall design is still "implementable" and ultimately subject to review of the actual implementation. As we learn more through implementation we can iron out any remaining differences, and figure out how to best test the functionality.

EDIT 2024-11-23: Better wording, more helpful suggestions.


Other than the streaming API from the agent, this CFP introduces a dependency on the ipcache bpf map which isn't a stable API exposed to components beyond the agent. An e2e test for the toFQDN HA feature in CI will be added to catch such datapath changes impacting SDP, giving us the opportunity to make necessary changes.

When a new change is introduced to the ipcache bpf map, the userspace code to access data from ipcache needs to be refactored out into a library. This library would then be used in both the cilium agent and SDP. The library needs to support reading data from both old and new formats (we'll likely need to carry this for a couple of stable releases). This would mean that when upgrading to a new cilium version with updates to ipcache bpf map, SDP should always be upgraded first. This would allow SDP to seamlessly switch over to reading from the new ipcache map when the actual map migration is performed by cilium agent after it is upgraded. The actual heuristics to determine when to switch will depend on the nature of changes to the bpf map. Upgrading the cilium agent first would prevent SDP from reading ipcache data. So, the only supported upgrade order would be SDP first and then cilium agent.