Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFP-30984 : Add CFP for DNS proxy HA - V2 #54

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions cilium/CFP-30984-dns-proxy-ha-v2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# CFP-30984: toFQDN DNS proxy HA

**SIG: SIG-POLICY**

**Begin Design Discussion:** 2024-08-19

**Cilium Release:** 1.17

**Authors:** Hemanth Malla <[email protected]>, Vipul Singh <[email protected]>, Tamilmani Manoharan <[email protected]>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Administrative point:

If we think we agree on the basics here, but would like more of the design to be fleshed out:

Suggested change
Status: Provisional

(If provisional, please highlight anything that still needs to be expanded with the text "UNRESOLVED: description of unresolved design point").

Or alternatively if we think the design is satisfactorily complete:

Suggested change
Status: Implementable

See status in the readme for more details. https://github.com/cilium/design-cfps?tab=readme-ov-file#status

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the point I made at the end of the discussion with @cilium/sig-policy today. Whatever we agree upon, we should be able to merge in. If there are points left to iterate on, we can highlight those points. If we've resolved the points of difference we can mark it implementable. Goal is to make iteration easier so we don't have to overthink an entire design that is subject to change during implementation, but rather we can more incrementally change the design in-tree as we discuss, debate, implement and learn.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a status field once we close out the open discussion items.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like we haven't documented Provisional in the readme yet ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looks like we haven't documented Provisional in the readme yet ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, provisional is just under discussion in #48 but not documented yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mark this as implementable.

## Summary

Cilium agent uses a proxy to intercept all DNS queries and obtain necessary information to enforce FQDN network policies. However, the lifecycle of this proxy is coupled with the cilium agent. When an endpoint has a toFQDN network policy in place, cilium installs a redirect to capture all DNS traffic. So, when the agent is unavailable, all DNS requests time out, including when DNS name to IP address mappings are already in place for this name. DNS policy unload on shutdown can be enabled on the agent, but it works only when the L7 policy is set to * and the agent is shutdown gracefully.

This CFP introduces a standalone DNS proxy that can run alongside the Cilium agent, which should eliminate hard dependency for names that already have policy map entries in place.

## Motivation

Users rely on toFQDN policies to enforce network policies against traffic to destinations outside the cluster, typically to blob storage / other services on the internet. Rolling out the Cilium agent should not result in packet drops. Introducing a high availability (HA) mode will allow for increased adoption of toFQDN network policies in critical environments.

## Goals

* Introduce a streaming gRPC API for exchanging FQDN policy related information.
* Introduce a standalone DNS proxy (SDP) that binds on the same port as built-in proxy with SO_REUSEPORT.
* Enforce L7 DNS policy via SDP.

## Non-Goals

* Updating new DNS <> IP mappings when the agent is down.

## Proposal

### Overview

There are two parts to enforcing toFQDN network policy. L3/L4 policy enforcement against IP addresses resolved from an FQDN and policy enforcement on DNS requests (L7 DNS policy). To enforce L3/L4 policy, per endpoint policy bpf maps need to be updated. We'd like to avoid multiple processes writing entries to policy maps, so the standalone DNS proxy (SDP) needs a mechanism to notify agent of newly resolved FQDN <> IP address mappings. This CFP proposes exposing a new gRPC streaming API from the cilium agent. Since the connection is bi-directional, the cilium agent can reuse the same connection to notify the SDP of L7 DNS policy changes.

Additionally, SDP needs to translate the IP address to cilium identity to enforce the policy. This CFP proposes to retrieve the identity mapping from the cilium_ipcache BPF map. If / when other another abstraction is introduced for getting IP<->Identity mappings (for ex, in L7 proxy), this implementation can use that abstraction. This CFP will focus on the contract between SDP and Cilium agent to exchange minimum information for implementing the high availability mode.

In addition to existing unix domain socket (UDS) opened by the agent to host HTTP APIs, we'll need a new UDS for the gRPC streaming service with similar permissions.

### RPC Methods

Method : UpdateMappings (Invoked from SDP to agent)

_rpc UpdatesMappings(steam FQDNMapping) returns (Result){}_
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
Request :
```
message FQDNMapping {
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
string FQDN = 1; // DNS Name of the request made by the client
repeated bytes IPS = 2; // Resolved IP addresses
uint32 TTL = 3;
uint64 source_identity = 4; // Identity of the client making the DNS request
bytes source_ip = 5; // IP address of the client making the DNS request
int dns_response_code = 6;
}
```
Response :
```
message Result {
bool success = 1;
}
```

Method : UpdatesDNSRules ( Invoked from agent to SDP via bi-directional stream )

_rpc UpdatesDNSRules(stream DNSPolicies) returns (Result){}_
Request :
```
message DNSServer {
uint64 dns_server_identity = 1; // Identity of destination DNS server
uint32 dns_server_port = 2;
uint32 dns_server_proto = 3;
}

message DNSPolicy {
uint64 source_identity = 1; // Identity of the workload this L7 DNS policy should apply to
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
repeated string dns_pattern = 2; // Allowed DNS pattern this identity is allowed to resolve.
repeated DNSServer dns_servers = 3;
}

message DNSPolicies {
repeated DNSPolicy egress_l7_dns_policy = 1;
}

```

*Note: `dns_pattern` follows the same convention used in CNPs. See https://docs.cilium.io/en/stable/security/policy/language/#dns-based for more details*

Response :
```
message Result {
bool success = 1;
}
```

### Load balancing

SDP and agent's DNS proxy will run on the same port using SO_REUSEPORT. By default, kernel will use round robin algorithm to distribute load evenly between all sockets in the reuseport group. If cilium agent's DNS proxy goes down, kernel will automatically switch all traffic to SDP and vice versa. In the future, we can consider using a custom bpf program to make SDP only act as a hot standby. See (PoC)[https://github.com/hemanthmalla/reuseport_ebpf/blob/main/bpf/reuseport_select.c] / eBPF summit 2023 talk for more details.


### High Level Information Flow

* Agent starts up with gRPC streaming service (only after resources are synced from k8s and ipcache bpf map is populated)
* SDP starts up.
* Connects to gRPC service, retrying periodically until success.
* Agent sends current snapshot for L7 DNS Policy enforcement via UpdatesDNSRules to SDP.
* On policy recomputation, agent invokes UpdatesDNSRules.
* On DNS request from the client, DNS request redirects to DNS proxy port.
* Kernel round robin load balances between SDP and built in proxy.
* Assuming SDP gets the request, SDP enforces L7 DNS policy.
* Lookup identity based on IP address via bpf map.
hemanthmalla marked this conversation as resolved.
Show resolved Hide resolved
* Check against policy snapshot if this identity is allowed to resolve the current DNS name and is allowed to talk to DNS server target identity (also needs lookup).
* Make upstream DNS request from SDP.
* On response, SDP invokes UpdatesMappings() to notify agent of new mappings.
* Release DNS response after success from UpdatesMappings() / timeout.

### Handling SDP <> Agent re-connections

* When the agent is unavailable, SDP will periodically attempt to re-connect to the streaming service. Any FQDN<>IP mappings resolved when the agent is down will be cached in SDP and `UpdatesMappings` will be retried after establishing the connection.
* A new bpf map for ipcache is populated on agent startup, so SDP needs to re-open the ipcache bpf map when the connection is re-established. See https://github.com/cilium/cilium/pull/32864 for similar handling in envoy.
* On a new connection from SDP, the agent will invoke `UpdatesDNSRules` to notify SDP of all L7 DNS policy rules.

* SDP will not listen on the DNS proxy port until a connection is established with cilium agent and initial L7 DNS policy rules are received. Meanwhile, built-in DNS proxy will continue to serve requests. SDP relies on cilium agent for initial bootstrap. In future, we could make SDP retrieve initial policy information from other sources, but this is not in scope for this CFP.
joestringer marked this conversation as resolved.
Show resolved Hide resolved

### Handling Upgrades
Copy link
Member

@joestringer joestringer Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't feel that this section adequately explains upgrades. The text is capturing some of the key discussions we have had though, which helps. What I have in mind is something a bit more like a test plan that outlines the bringing up / shutting down of different components and exactly what's expected in terms of forwarding state around.

Overall though I don't feel that continuing this path by debating back and forth in this design doc is really helping drive this upgrades discussion. I think that at this point it makes sense to just accept that some aspects of this CFP are underspecified / not agreed upon, but the overall design is still "implementable" and ultimately subject to review of the actual implementation. As we learn more through implementation we can iron out any remaining differences, and figure out how to best test the functionality.

EDIT 2024-11-23: Better wording, more helpful suggestions.


Other than the streaming API from the agent, this CFP introduces a dependency on the ipcache bpf map which isn't a stable API exposed to components beyond the agent. Sufficent tests will be added to catch such datapath changes impacting SDP. In order to support a safe upgrade path, SDP would need to support reading from the current and future formats of the map (including possibly reading from an entire new map).
joestringer marked this conversation as resolved.
Show resolved Hide resolved