Skip to content

Commit 223792f

Browse files
Merge pull request #1864 from thomasferrandiz/nftables-adr
add nftables adr
2 parents 40b7dd9 + 6a28fec commit 223792f

File tree

1 file changed

+91
-0
lines changed

1 file changed

+91
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Add nftables implementation to flannel
2+
3+
Date: 2024-02-01
4+
5+
## Status
6+
7+
Writing
8+
9+
## Context
10+
At the moment, flannel uses iptables to mask and route packets.
11+
Our implementation is based on the library from coreos (https://github.com/coreos/go-iptables).
12+
13+
There are several issues with using iptables in flannel:
14+
* performance: packets are matched using a list so performance is O(n). This isn't very important for flannel because use few iptables rules anyway.
15+
* stability:
16+
** rules must be purged then updated every time flannel needs to change a rule to keep the correct order
17+
** there can be interferences with other k8s components using iptables as well (kube-proxy, kube-router...)
18+
* deprecation: nftables is pushed as a replacement for iptables in the kernel and in future distros including the future RHEL.
19+
20+
References:
21+
- https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/3866-nftables-proxy/README.md#motivation
22+
23+
## Current state
24+
In flannel code, all references to iptables are wrapped in the `iptables` package.
25+
26+
The package provides the type `IPTableRule` to represent an individual rule. This type is almost entirely internal to the package so it would be easy to refactor the code to hide in favor of a more abstract type that would work for both iptables and nftables rules.
27+
28+
Unfortunately the package doesn't provide an interface so in order to provide both an iptables-based and an nftables-based implementation this needs to be refactored.
29+
30+
This package includes several Go interfaces (`IPTables`, `IPTablesError`) that are used for testing.
31+
32+
## Requirements
33+
Ideally, flannel will include both iptables and nftables implementation. These need to coexist in the code but will be mutually exclusive at runtime.
34+
35+
The choice of which implementation to use will be triggered by an optional CLI flag.
36+
iptables will remain the default for the time being.
37+
38+
Using nftables is an opportunity for optimising the rules deployed by flannel but we need to be careful about retro-compatibility with the current backend.
39+
40+
Starting flannel in either mode should reset the other mode as best as possible to ensure that users don't need to reboot if they need to change mode.
41+
42+
## Architecture
43+
Currently, flannel uses two dedicated tables for its own rules: `FLANNEL-POSTRTG` and `FLANNEL-FWD`.
44+
* flannel adds rules to the `FORWARD` and `POSTROUTING` tables to direct traffic to its own tables.
45+
* rules in `FLANNEL-POSTRTG` are used to manage masquerading of the traffic to/from the pods
46+
* rules in `FLANNEL-FWD` are used to ensure that traffic to and from the flannel network can be forwarded
47+
48+
With nftables, flannel would have its own dedicated table (`flannel`) with arbitrary chains and rules as needed.
49+
50+
see https://wiki.nftables.org/wiki-nftables/index.php/Performing_Network_Address_Translation_(NAT)
51+
```
52+
# !! untested example
53+
table flannel {
54+
chain flannel-postrtg {
55+
type nat hook postrouting priority 0;
56+
# kube-proxy
57+
meta mark 0x4000/0x4000 return
58+
# don't NAT traffic within overlay network
59+
ip saddr $pod_cidr ip daddr $cluster_cidr return
60+
ip saddr $cluster_cidr ip daddr $pod_cidr return
61+
# Prevent performing Masquerade on external traffic which arrives from a Node that owns the container/pod IP address
62+
ip saddr != $pod_cidr ip daddr $cluster_cidr return
63+
# NAT if it's not multicast traffic
64+
ip saddr $cluster_cidr ip daddr != 224.0.0.0/4 nat
65+
# Masquerade anything headed towards flannel from the host
66+
ip saddr != $cluster_cidr ip daddr $cluster_cidr nat
67+
}
68+
69+
chain flannel-fwd {
70+
type filter hook input priority 0; policy drop;
71+
# allow traffic to be forwarded if it is to or from the flannel network range
72+
ip saddr flannelNetwork accept
73+
ip daddr flannelNetwork accept
74+
}
75+
}
76+
```
77+
78+
## nftables library
79+
We can either:
80+
* call the `nft` executable directly
81+
* use https://github.com/kubernetes-sigs/knftables which is developed for kube-proxy and should cover our use case
82+
83+
## Implementation steps
84+
* refactor current iptables code to better encapsulate iptables calls in the dedicated package
85+
* implement nftables mode that is the exact equivalent of the current iptables code
86+
* add similar unit tests and e2e test coverage
87+
* try to optimize the code using nftables-specific feature
88+
* integrate the new flag in k3s
89+
90+
91+
## Decision

0 commit comments

Comments
 (0)