Test and experiments with the SCTP protocol (rfc4960).
SCTP is supported by the Linux kernel (lksctp), and there are user-space implementations for instance usrsctp.
All examples and tests in this ovl are using multihoming because
that's where the problems are. The dual-path
network-topology is often used;
Virtual IPs (VIP) and load-balancing with multihoming is a challenge. If simple load-balancing like ECMP that hashes on addresses (and ports) is used the primary and secondary path may be load-balanced to different targets.
SCTP load-balancing in the nfqueue-loadbalancer overcome the problem by hashing on ports only.
The nfqueue-loadbalancer
is used to build the sctpt
test program so it must be
downloaded but load-balancing is not used in this example.
Manual test;
./sctp.sh nfqlb_download
./sctp.sh test start > $log
# On vm-001
sctpt server --log 6 --addr 192.168.1.1,192.168.4.1
# On vm-221
sctpt client --log 6 --addr 192.168.1.1,192.168.4.1 --laddr 192.168.2.221,192.168.6.221
# (typed text will be echoed by the server)
# Type ^D to quit
The sctpt
test program is used to setup a multihomed sctp
"association" (trace).
The multihoming addresses are passed in the INIT
and INIT_ACK
messages.
You can trace with tcpdump
on any VM and try to disable the primary
path and watch the failover to the secondary path;
# On vm-201
iptables -A FORWARD -p sctp -j DROP
# (send something from the client)
iptables -D FORWARD 1
The nfqueue-loadbalancer can be used for sctp load balancing and UDP encapsulation can be used.
UDP encapsulation with load-balancing;
./sctp.sh nfqlb_download
xcluster_UDP_ENCAP=9899 xcluster_NETNS=yes ./sctp.sh test --no-stop nfqlb > $log
# On vm-221
sysctl -w net.sctp.encap_port=9899
sysctl -w net.sctp.udp_port=9899
sctpt client --log 6 --addr 10.0.0.1,1000::81 --laddr 192.168.2.221,1000::1:192.168.6.221
# On vm 201
iptables -A FORWARD -p udp --sport 9899 -j DROP
iptables -D FORWARD 1
The sctpt
test program uses Linux kernel SCTP and is written in C
.
The go
language does not support sctp in standard packages. There
are 3rd party implementations that uses
lksctp as well as in
user-space.
Build;
./sctp.sh nfqlb_download
make -C src
alias sctpt=/tmp/uablrek/sctpt/sctpt/sctpt
sctpt # (brief help printout)
Start server;
sctpt server -h # Help printout
sctpt server --log=7 --addr=127.0.0.1,127.0.0.2
Start client;
sctpt client -h # Help printout
sctpt client --log=7 --addr=127.0.0.1,127.0.0.2 --laddr=127.0.1.1,127.0.1.2
The local addresses on the client shall always be specified for
multihoming using the --laddr
option.
This mode is intended for testing failovers and mimics the
ctraffic program. The same
server as for sctpt client
is used.
Statistics are continuously written to shared memory and can be viewed in real time.
sctpt stats -h # Help printout
sctpt ctraffic -h # Help printout
sctpt stats --interval=50 init
sctpt ctraffic --addr=127.0.0.1,127.0.0.2 --laddr=127.0.1.1,127.0.1.2 --rate=10
sctpt stats show
watch /tmp/uablrek/sctpt/sctpt/sctpt stats show
To test with a "vanilla" K8s setup with only one traffic interface per
node the default xcluster
network is setup with an additional tester network;
K8s supports services with "protocol: SCTP". The normal K8s
load-balancing, kube-proxy
(both proxy-modes), uses NAT which
basically makes multihoming impossible, or at least
very complicated.
Make sure to build the sctp-test and upload to local registry first;
./sctp.sh mkimage
images lreg_upload registry.nordix.org/cloud-native/sctp-test:latest
Multihoming to a single-homed K8s POD (fails);
./sctp.sh test start_k8s > $log
# On vm-002 (incoming traffic arrives here) or on vm-201 (router)
tcpdump -ni eth1 sctp
# On vm-221
sctpt client --addr=10.0.0.72 --port=7002 --laddr=192.168.2.221,192.168.3.221
# Or;
sctpt client --addr=1000::72 --port=7002 --laddr=1000::1:192.168.2.221,1000::1:192.168.3.221
Assuming externalTrafficPolicy: Local
this happens;
-
INIT from the client arrives on eth1 with load-balancer-IP (VIP) 10.0.0.72. The INIT chunk contains the multihoming addresses of the client (192.168.2.221, 192.168.3.221).
-
The VIP address is translated to the podIP (e.g 11.0.2.2) by iptables or ipvs (configured by kube-proxy).
-
The server (in the POD) sends a INIT_ACK (without addresses) and the association is succesful.
-
The server tries to send HB to the client's multihoming addresses. But the source is NAT'ed to the node address.
-
The client responds with ABORT since the source is not valid.
-
The ABORT messages arrives to the node with the secondary address (192.168.3.221) as source but are not forwarded to the POD, proably because there is no connection in the Linux
conntrack
which is used for NAT.
What we would like to have is a load-balancer that selects the same target for both the primary and secondary path whereever the packets arrive;
The K8s kube-proxy
will not be extended to handle this but an option
may be to write a custom proxier
using kpng.