Skip to content

Commit

Permalink
dev-docs: add option to deploy a full L3 vpn
Browse files Browse the repository at this point in the history
  • Loading branch information
burgerdev committed Dec 7, 2023
1 parent c0d8508 commit 0583749
Show file tree
Hide file tree
Showing 16 changed files with 382 additions and 36 deletions.
50 changes: 50 additions & 0 deletions dev-docs/howto/vpn/helm/README.experimental.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Experimental Constellation VPN

This variant of the Helm chart establishes full L3 connectivity between on-prem
workloads and constellation pods.

> **WARNING**: The experimental version of this Helm chart is, well,
> experimental. It messes with the node configuration and has the
> potential to break all networking. It's only tested on GCP, and only with
> pre-release versions of Constellation.
> Use at your own risk!
## Installation

1. Create and populate the configuration. Make sure to switch on `experimental.l3.enabled`!

```sh
helm inspect values . >config.yaml
```

2. Install the Helm chart.

```sh
helm install vpn . -f config.yaml
```

## Architecture

In addition to the NAT-based resources, the frontend contains an init container
that sets up a networking bypass around Cilium. This is necessary to circumvent
the restrictions that Cilium applies to pod traffic (source IP enforcement, for
example). VPN traffic is routed directly to the host network, which in turn is
modified to forward VPN traffic correctly to other pods.

A VPN operator deployment is added that configures the `CiliumEndpoint` with
on-prem IP ranges.

## Cleanup

In case this chart causes problems with cluster networking that are not
resolved by uninstalling it, rebooting the worker nodes to start with a fresh
network setup should help. There's a button for this in the *Instance Group*
view of GCP.

## Limitations

* Service IPs need to be proxied by the VPN frontend pod. This is a single
point of failure, and it may become a bottleneck.
* The VPN is bound to a single node, which is another single point of failure.
* Interaction between VPN and NetworkPolicy is not fully explored yet, and may
have surprising consequences.
6 changes: 6 additions & 0 deletions dev-docs/howto/vpn/helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,9 @@ The service IP range is handed to a transparent proxy running in the VPN fronten
* IPs are NATed, so the Constellation pods won't see the real on-prem IPs.
* NetworkPolicy can't be applied selectively to the on-prem ranges.
* No connectivity from Constellation to on-prem workloads.

## Alternatives

If NAT and proxy are not acceptable for your environment, there's an
[experimental alternative mode](README.experimental.md) that establishes full
L3 connectivity between pods and on-prem services.
65 changes: 65 additions & 0 deletions dev-docs/howto/vpn/helm/files/routing/experimental/all-nodes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/bin/sh

# This script runs in two modes:
# - on the VPN frontend node
# - on other nodes
# We detect which mode we're in by looking for the magic network interface that
# the VPN frontend creates.

# TODO: Check if we're accidentally going through the cloud router (can we prevent this with tc?)

set -eu

reconcile_vpn() {
# Bypass the lxc* interface so that VPN packets are not subject to Cilium's
# pod restrictions.

ip address replace 169.254.42.1 dev "${MAGIC_INTERFACE}" scope link
ip link set dev "${MAGIC_INTERFACE}" up
ip route replace 169.254.42.2 dev "${MAGIC_INTERFACE}" scope link
for cidr in ${VPN_PEER_CIDRS}; do
ip route replace "${cidr}" via 169.254.42.2 dev "${MAGIC_INTERFACE}"
done

# Tell the node to send all packets for pod IP ranges over cilium_wg0.
# Local pods match the more specific routes.
ip route replace "${VPN_POD_CIDR}" dev cilium_wg0
}

cleanup_vpn() {
# We expect to clean up only if the magic interface vanishes. In that case,
# all the routes pointing to it are automatically deleted.
ip route delete "${VPN_POD_CIDR}" dev cilium_wg0 || true
}

reconcile_other() {
cleanup_vpn 2> /dev/null
for cidr in ${VPN_PEER_CIDRS}; do
ip route replace "${cidr}" dev cilium_wg0
done
}

cleanup_other() {
for cidr in ${VPN_PEER_CIDRS}; do
ip route delete "${cidr}" || true
done
}

cleanup_exit() {
cleanup_other
cleanup_vpn
exit 143 # 128 + SIGTERM
}
trap cleanup_exit INT TERM

# In the spirit of a poor man's reconciliation loop, the script keeps enforcing
# the routing rules periodically.

while true; do
if ip link show "${MAGIC_INTERFACE}" 2> /dev/null >&2; then
reconcile_vpn
else
reconcile_other
fi
sleep 10
done
33 changes: 33 additions & 0 deletions dev-docs/howto/vpn/helm/files/routing/experimental/frontend-pod.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/sh

set -eu

if [ "$$" -eq "1" ]; then
echo 'This script must run in the root PID namespace, but $$ == 1!' >&2
exit 1
fi

# Set up a parallel veth connection to the host network namespace so that we
# are not subject to Cilium restrictions (e.g. source IPs).

ip netns attach root 1

ip link add vpn_upper type veth peer name "${MAGIC_INTERFACE}"
ip link set dev "${MAGIC_INTERFACE}" netns root

ip link add br0 type bridge
ip link set dev vpn_upper master br0
ip address add 169.254.42.2 dev br0 scope link

ip link set dev br0 up
ip link set dev vpn_upper up

# Route traffic from the VPN through the bypass.

table=41
ip route replace 169.254.42.1 dev br0
ip route replace default via 169.254.42.1 dev br0 table "${table}"
# IPSec
ip rule add to "${VPN_POD_CIDR}" fwmark 0x8/0x8 table "${table}" priority 10
# Wireguard
ip rule add to "${VPN_POD_CIDR}" iif vpn_wg0 table "${table}" priority 11
17 changes: 17 additions & 0 deletions dev-docs/howto/vpn/helm/files/routing/experimental/operator.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!/bin/sh

all_ips() {
kubectl get pods vpn-frontend-0 -o go-template --template '{{ range .status.podIPs }}{{ printf "%s " .ip }}{{ end }}'
echo "${VPN_PEER_CIDRS}"
}

cep_patch() {
printf '[{"op": "replace", "path": "/status/networking/addressing", "value": '
for ip in $(all_ips); do printf '{"ipv4": "%s"}' "${ip}"; done | jq -s -c -j
echo '}]'
}

while true; do
kubectl patch ciliumendpoint vpn-frontend-0 --type json --patch "$(cep_patch)" > /dev/null
sleep 10
done
13 changes: 13 additions & 0 deletions dev-docs/howto/vpn/helm/files/routing/pod-nat-setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/sh

set -eu

# Set up iptables rules to NAT traffic targeting the pod network.

iptables -t nat -N VPN_POST || iptables -t nat -F VPN_POST

for cidr in ${VPN_PEER_CIDRS}; do
iptables -t nat -A VPN_POST -s "${cidr}" -d "${VPN_POD_CIDR}" -j MASQUERADE
done

iptables -t nat -C POSTROUTING -j VPN_POST || iptables -t nat -A POSTROUTING -j VPN_POST
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,6 @@

set -eu

### Pod IPs ###

# Pod IPs are just NATed.

iptables -t nat -N VPN_POST || iptables -t nat -F VPN_POST

for cidr in ${VPN_PEER_CIDRS}; do
iptables -t nat -A VPN_POST -s "${cidr}" -d "${VPN_POD_CIDR}" -j MASQUERADE
done

iptables -t nat -C POSTROUTING -j VPN_POST || iptables -t nat -A POSTROUTING -j VPN_POST

### Service IPs ###

# Service IPs need to be connected to locally to trigger the cgroup connect hook, thus we send them to the transparent proxy.

# Packets with mark 1 are for tproxy and need to be delivered locally.
Expand Down
2 changes: 2 additions & 0 deletions dev-docs/howto/vpn/helm/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ app.kubernetes.io/instance: {{ .Release.Name }}
value: {{ .Values.podCIDR | quote }}
- name: VPN_SERVICE_CIDR
value: {{ .Values.serviceCIDR | quote }}
- name: MAGIC_INTERFACE
value: cilium_c11n_vpn
{{- end }}
11 changes: 7 additions & 4 deletions dev-docs/howto/vpn/helm/templates/configmaps.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "..fullname" . }}-tproxy
name: {{ include "..fullname" . }}-routes
labels: {{- include "..labels" . | nindent 4 }}
data:
{{ (.Files.Glob "files/tproxy-setup.sh").AsConfig | indent 2 }}
---
{{ (.Files.Glob "files/routing/*.sh").AsConfig | indent 2 }}
{{- if .Values.experimental.l3.enabled }}
{{ (.Files.Glob "files/routing/experimental/*.sh").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.wireguard.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
Expand All @@ -15,8 +18,8 @@ metadata:
data:
{{ (.Files.Glob "files/wireguard-setup.sh").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.ipsec.enabled }}
---
{{ if .Values.ipsec.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
34 changes: 34 additions & 0 deletions dev-docs/howto/vpn/helm/templates/operator-deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{{ if .Values.experimental.l3.enabled -}}
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "..fullname" . }}-operator
labels: {{- include "..labels" . | nindent 4 }}
spec:
replicas: 1
selector:
matchLabels:
{{- include "..selectorLabels" . | nindent 6 }}
component: operator
template:
metadata:
labels:
{{- include "..selectorLabels" . | nindent 8 }}
component: operator
spec:
serviceAccountName: {{ include "..fullname" . }}
automountServiceAccountToken: true
containers:
- name: operator
image: nixery.dev/shell/jq/kubernetes
command: ["/bin/sh", "/scripts/operator.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
volumeMounts:
- name: scripts
mountPath: "/scripts"
readOnly: true
volumes:
- name: scripts
configMap:
name: {{ include "..fullname" . }}-routes
{{- end }}
31 changes: 31 additions & 0 deletions dev-docs/howto/vpn/helm/templates/rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{{ if .Values.experimental.l3.enabled -}}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "..fullname" . }}
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "..fullname" . }}
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get"]
- apiGroups: ["cilium.io"]
resources: ["ciliumendpoints"]
verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "..fullname" . }}
subjects:
- kind: ServiceAccount
name: {{ include "..fullname" . }}
roleRef:
kind: Role
name: {{ include "..fullname" . }}
apiGroup: rbac.authorization.k8s.io
{{- end }}
39 changes: 39 additions & 0 deletions dev-docs/howto/vpn/helm/templates/routing-daemonset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{{ if .Values.experimental.l3.enabled }}
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: {{ include "..fullname" . }}-routes
labels: {{- include "..labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "..selectorLabels" . | nindent 6 }}
component: routes
template:
metadata:
labels:
{{- include "..selectorLabels" . | nindent 8 }}
component: routes
spec:
hostNetwork: true
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: route
image: "nixery.dev/shell/iproute2/iptables"
securityContext:
capabilities:
add: ["NET_ADMIN"]
command: ["/bin/sh", "/script/all-nodes.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
volumeMounts:
- name: routes
mountPath: "/script"
readOnly: true
volumes:
- name: routes
configMap:
name: {{ include "..fullname" . }}-routes
{{- end }}
1 change: 1 addition & 0 deletions dev-docs/howto/vpn/helm/templates/strongswan-secret.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ connections {
local_ts = {{ .Values.podCIDR }},{{ .Values.serviceCIDR }}
remote_ts = {{ join "," .Values.peerCIDRs }}
start_action = trap
set_mark_in = "0x8/0x8"
}
}
}
Expand Down
Loading

0 comments on commit 0583749

Please sign in to comment.