Skip to content

Commit

Permalink
dev-docs: add option to deploy a full L3 vpn
Browse files Browse the repository at this point in the history
  • Loading branch information
burgerdev committed Nov 21, 2023
1 parent da024ed commit b0521f8
Show file tree
Hide file tree
Showing 13 changed files with 318 additions and 29 deletions.
61 changes: 61 additions & 0 deletions dev-docs/howto/vpn/README.experimental.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Experimental Constellation VPN

This variant of the Helm chart establishes full L3 connectivity between on-prem
workloads and constellation pods.

> **WARNING**: The experimental version of this Helm chart is, well,
> experimental. It messes with the node configuration and has the
> potential to break all networking. It's only tested on GCP, and only with
> pre-release versions of Constellation, and even there it caused problems.
> Use at your own risk!
## Installation

1. Choose one of the Constellation worker nodes and label it as the VPN node.

```sh
node=$(kubectl get nodes -l node-role.kubernetes.io/control-plane!="" -o jsonpath='{.items[0].metadata.name}')
kubectl label nodes "$node" constellation.edgeless.systems/node-role=vpn
```

1. Create and populate the configuration. Make sure to switch on `experimental.l3.enable`!

```sh
helm inspect values . >config.yaml
```

2. Install the Helm chart.

```sh
helm install vpn . -f config.yaml
```

3. Follow the post-installation instructions displayed by the CLI.

## Architecture

In addition to the NAT-based resources, the frontend contains an init container
that sets up a networking bypass around Cilium. This is necessary to circumvent
the restrictions that Cilium applies to pod traffic (source IP enforcement, for
example). VPN traffic is routed directly to the host network, which in turn is
modified to forward VPN traffic correctly to other pods.

An artificial `CiliumEndpoint` is created to make Cilium aware of the on-prem
IP ranges and route traffic from other nodes to the VPN node. There's also a
`DaemonSet` that configures appropriate routes in the host network namespace of
all cluster nodes.

## Cleanup

There's no built-in lifecycle management of the host network resources created
by this chart. To remove the VPN configuration, first uninstall the Helm chart
and then reboot all the nodes to start with a clean network. There's a button
for this in the *Instance Group* view of GCP.

## Limitations

* Service IPs need to be proxied by the VPN frontend pod. This is a single
point of failure, and it may become a bottleneck.
* The VPN is bound to a single node, which is another single point of failure.
* Interaction between VPN and NetworkPolicy is not fully explored yet, and may
have surprising consequences.
6 changes: 6 additions & 0 deletions dev-docs/howto/vpn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ The service IP range is handed to a transparent proxy running in the VPN fronten
* IPs are NATed, so the Constellation pods won't see the real on-prem IPs.
* NetworkPolicy can't be applied selectively to the on-prem ranges.
* No connectivity from Constellation to on-prem workloads.

## Alternatives

If NAT and proxy are not acceptable for your environment, there's an
[experimental alternative mode](README.experimental.md) that establishes full
L3 connectivity between pods and on-prem services.
15 changes: 15 additions & 0 deletions dev-docs/howto/vpn/files/routing/experimental/all-nodes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/sh

cleanup() {
for cidr in ${VPN_PEER_CIDRS}; do
ip route delete "${cidr}" || true
done
}
trap cleanup INT TERM

while true; do
for cidr in ${VPN_PEER_CIDRS}; do
ip route replace "${cidr}" dev cilium_wg0
done
sleep 10
done
71 changes: 71 additions & 0 deletions dev-docs/howto/vpn/files/routing/experimental/frontend-host.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
#!/bin/sh

set -eu

# TODO: The bridge needs to be cleaned up if the pod migrates to another host!

# Bypass the lxc* interface so that VPN packets are not subject to Cilium's
# pod restrictions.

ip link delete vpn_br || true
ip link add vpn_br type bridge
ip link set dev vpn_lower master vpn_br
ip address add 169.254.42.1 dev vpn_br scope link
ip link set dev vpn_br up
ip link set dev vpn_lower up
ip route replace 169.254.42.2 dev vpn_br

# Traffic from local pods or other nodes to the VPN is fwmarked and sent to
# the VPN frontend pod.

ip route replace default via 169.254.42.2 dev vpn_br table 41
ip rule add fwmark 0x2/0x2 table 41 priority 41 || true

iptables -t mangle -N VPN_PRE || iptables -t mangle -F VPN_PRE
for cidr in ${VPN_PEER_CIDRS}; do
iptables -t mangle -A VPN_PRE -i lxc+ -d "${cidr}" -j MARK --set-mark 2
iptables -t mangle -A VPN_PRE -i cilium_wg0 -d "${cidr}" -j MARK --set-mark 2
done

iptables -t mangle -C PREROUTING -j VPN_PRE || iptables -t mangle -I PREROUTING -j VPN_PRE

# Cilium does NAT if the destination IP appears to be outside the cluster,
# which would affect VPN traffic, too. We register a rule that skips Cilium
# NAT for traffic to the VPN.

iptables -t nat -N VPN_POST || iptables -t nat -F VPN_POST
for cidr in ${VPN_PEER_CIDRS}; do
iptables -t nat -I VPN_POST -d "${cidr}" -j ACCEPT
done
iptables -t nat -C POSTROUTING -j VPN_POST || iptables -t nat -I POSTROUTING -j VPN_POST

# Now we tell the host how to deal with traffic from the VPN to the pod
# network. The rule that we're crafting below is:
# Send everything from the VPN to the pod network over Cilium's Wireguard
# tunnel, unless the traffic is for a local pod.
# This turns out to be a bit tricky to model, because we need to split the
# traffic between lcx+ and cilium_wg0. Cilium configures the host-local routes
# in the host network namespace, and we would like to reuse these, so we can't
# create a separate routing table for fw-marked VPN packets.
# For some reason, policy-based routing on the reroute-check after FORWARD
# does not work here. But we can turn the logic around and try to mark packets
# that *don't* come from the VPN.

# Tell the node to send all packets for pod IP ranges over cilium_wg0.
# Local pods match the more specific routes.
ip route replace "${VPN_POD_CIDR}" dev cilium_wg0

# Create a routing table that shows marked traffic the default route (i.e. the
# physical interface).
# Word splitting is intended here.
# shellcheck disable=SC2046
ip route replace $(ip route show default) table 44
ip rule add fwmark 0x4/0x4 table 44 priority 44 || true

# Traffic that should go to the physical interface: locally created packets
# with source IP outside the pod CIDR and non-local destination.

iptables -t mangle -N VPN_OUTPUT || iptables -t mangle -F VPN_OUTPUT
iptables -t mangle -A VPN_OUTPUT -o lxc+ -j RETURN
iptables -t mangle -A VPN_OUTPUT ! -s "${VPN_POD_CIDR}" -d "${VPN_POD_CIDR}" -j MARK --set-mark 0x4/0x4
iptables -t mangle -C OUTPUT -j VPN_OUTPUT || iptables -t mangle -I OUTPUT -j VPN_OUTPUT
28 changes: 28 additions & 0 deletions dev-docs/howto/vpn/files/routing/experimental/frontend-pod.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#!/bin/sh

set -eu

if [ "$$" -eq "1" ]; then
echo 'This script must run in the root PID namespace, but $$ == 1!' 2> /dev/null
fi

ip netns attach root 1

ip link delete vpn_upper || true
ip link delete vpn_lower || true
ip netns exec root ip link delete vpn_lower || true

ip link add vpn_upper type veth peer name vpn_lower
ip link set dev vpn_lower netns root
ip address add 169.254.42.2 dev vpn_upper scope link
ip link set dev vpn_upper up

echo "Meanwhile, in the root network namespace ..." >&2

ip netns exec root sh -x frontend-host.sh

echo "Back in the pod network namespace ..." >&2

ip route replace 169.254.42.1 dev vpn_upper
ip route replace default via 169.254.42.1 dev vpn_upper table 41
ip rule add to "${VPN_POD_CIDR}" iif vpn_wg0 table 41 priority 41
11 changes: 11 additions & 0 deletions dev-docs/howto/vpn/files/routing/pod-nat-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/sh

set -eu

iptables -t nat -N VPN_POST || iptables -t nat -F VPN_POST

for cidr in ${VPN_PEER_CIDRS}; do
iptables -t nat -A VPN_POST -s "${cidr}" -d "${VPN_POD_CIDR}" -j MASQUERADE
done

iptables -t nat -C POSTROUTING -j VPN_POST || iptables -t nat -A POSTROUTING -j VPN_POST
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,6 @@

set -eu

### Pod IPs ###

# Pod IPs are just NATed.

iptables -t nat -N VPN_POST || iptables -t nat -F VPN_POST

for cidr in ${VPN_PEER_CIDRS}; do
iptables -t nat -A VPN_POST -s "${cidr}" -d "${VPN_POD_CIDR}" -j MASQUERADE
done

iptables -t nat -C POSTROUTING -j VPN_POST || iptables -t nat -A POSTROUTING -j VPN_POST

### Service IPs ###

# Service IPs need to be connected to locally to trigger the cgroup connect hook, thus we send them to the transparent proxy.
Expand Down
13 changes: 9 additions & 4 deletions dev-docs/howto/vpn/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
{{- if .Values.ipsec.enabled }}
Required postinstallation steps (also see README.md):

# Configure the LoadBalancer
# Patch the CiliumEndpoint

kubectl patch cep {{ include "..fullname" . }}-routes --type='json' \
-p='[{"op": "replace", "path": "/status/networking/node", "value":"'$(kubectl get pods {{ include "..fullname" . }}-frontend-0 -o jsonpath={.status.hostIP})'"}]'

{{- if .Values.ipsec.enabled }}
# Patch the LoadBalancer

1. Find the node hosting the VPN server:
kubectl get pods {{ include "..fullname" . }}-frontend-0 -o jsonpath={.spec.nodeName}
2. Edit the load balancer resource in GCP and remove all other endpoints.
{{- end }}
2. Edit the load balancer in the cloud and remove all other endpoints.
{{- end }}
27 changes: 27 additions & 0 deletions dev-docs/howto/vpn/templates/ciliumendpoint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
{{ if .Values.experimental.l3.enable }}
apiVersion: cilium.io/v2
kind: CiliumEndpoint
metadata:
name: {{ include "..fullname" . }}-routes
status:
encryption: {}
id: 0
identity:
id: 0
networking:
addressing:
{{- range .Values.peerCIDRs }}
- ipv4: {{ . }}
{{- end }}
node: ""
policy:
egress:
enforcing: false
state: disabled
ingress:
enforcing: false
state: disabled
state: ready
visibility-policy-status: disabled
{{- end }}

11 changes: 7 additions & 4 deletions dev-docs/howto/vpn/templates/configmaps.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "..fullname" . }}-tproxy
name: {{ include "..fullname" . }}-routes
labels: {{- include "..labels" . | nindent 4 }}
data:
{{ (.Files.Glob "files/tproxy-setup.sh").AsConfig | indent 2 }}
---
{{ (.Files.Glob "files/routing/*.sh").AsConfig | indent 2 }}
{{- if .Values.experimental.l3.enable }}
{{ (.Files.Glob "files/routing/experimental/*.sh").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.wireguard.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
Expand All @@ -15,8 +18,8 @@ metadata:
data:
{{ (.Files.Glob "files/wireguard-setup.sh").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.ipsec.enabled }}
---
{{ if .Values.ipsec.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
40 changes: 40 additions & 0 deletions dev-docs/howto/vpn/templates/routing-daemonset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{{ if .Values.experimental.l3.enable }}
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: {{ include "..fullname" . }}-routes
labels: {{- include "..labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "..selectorLabels" . | nindent 6 }}
component: routes
template:
metadata:
labels:
{{- include "..selectorLabels" . | nindent 8 }}
component: routes
spec:
hostNetwork: true
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: route
image: "nixery.dev/shell/iproute2"
securityContext:
capabilities:
add: ["NET_ADMIN"]
command: ["/bin/sh", "/entrypoint.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
volumeMounts:
- name: routes
mountPath: "/entrypoint.sh"
subPath: "all-nodes.sh"
readOnly: true
volumes:
- name: routes
configMap:
name: {{ include "..fullname" . }}-routes
{{- end }}
Loading

0 comments on commit b0521f8

Please sign in to comment.