Haskell can be fast, too

Reading time: ~20 minutes.

I really enjoyed reading how the phoenix-framework people managed to get to two million active websocket connections.

I've heard some very smart people say that Haskell has an amazing runtime with very cheap threads. I have no reason to disbelieve them but we thought it'd be fun to see how it fares with websockets.

PELICAN END

Unlike the Phoenix people didn't have Rackspace sponsorship so we had to resort to the common man's cheap machines: EC2 spot instances. We bid $0.10 on two m4.xlarge machines with 16G of RAM and 4 cores which are usually 4-5 cents in eu-west.

We're using Nix to deploy tsung and a very simple Haskell chat program that just broadcasts messages to everyone.

The core handler of our chat program looks like this (full source here):

handleWS :: InChan ByteString -> PendingConnection -> IO ()
handleWS bcast pending = do
    localChan <- dupChan bcast
    connection <- acceptRequest pending

    forkIO $ forever $ do
        message <- readChan localChan
        sendTextData connection message

    -- loop forever
    let loop = do
            Text message <- receiveDataMessage connection
            writeChan bcast message
            loop
    loop

To run the ec2 machines we're using nixops which also does the spot-price bidding for us:

nixops create '<nix/test-setup.nix>'
nixops deploy

(See here for the full configuration including kernel tuning).

Unfortunately I could not get the distributed tsing going: The distributed testing uses an Erlang function called slave:start which connects through SSH and spawns Erlang on the remote host. This failed for reasons I didn't have time to debug.

But without the distributed loader there's a problem: A single server can only open ~65000 connections because ports are limited to 16 bits.

Luckily tsung support using multiple virtual IP addresses for a single network interface out of the box, and so we went to Amazon and clicked "Assign new IP" to assign more private IPs to our tsung box.

Now we associate the new IPs with our network interface:

ip addr add 172.31.23.115/20 dev eth0
ip addr add 172.31.23.113/20 dev eth0
ip addr add 172.31.23.114/20 dev eth0
ip addr add 172.31.23.112/20 dev eth0
ip addr add 172.31.18.80/20 dev eth0
ip addr add 172.31.18.81/20 dev eth0
ip addr add 172.31.18.82/20 dev eth0
ip addr add 172.31.18.83/20 dev eth0

We have a slightly different tsung config from the Phoenix people which we copy to our tsung box:

$ nixops scp --to tsung-1 code/src/tsung-conf.xml tsung-conf.xml
code/src/tsung-conf.xml -> root@52.31.104.126:tsung-conf.xml
tsung-conf.xml                                                  100% 1494     1.5KB/s   00:00

Running tsung

We used nix to tune the TCP stack and increase kernel limits, but we still need to tun ulimit to make sure we're not hitting the 1024 files limit:

$ nixops ssh tsung-1
$ ulimit -n 2000000
$ tsung -f tsung-conf.xml start
Starting Tsung
Log directory is: /root/.tsung/log/20151104-1622

tsung exports some data via a web interface on port 8091. We use an extra SSH tunnel so we can access this data on http://127.0.0.1:8091:

$ ssh root@tsung-1 -L 8091:127.0.0.1:8091

Problem 1: The firewall

All our Nix boxes are configured with a firewall enabled. This is because I start from a template configuration instead of starting from scratch.

The firewall uses connection tracking to make decisions, and connection tracking requires memory. When that memory is full the dmesg logs look like this:

[ 2960.570157] nf_conntrack: table full, dropping packet
[ 2960.575060] nf_conntrack: table full, dropping packet
[ 2960.629764] nf_conntrack: table full, dropping packet
[ 2960.678016] nf_conntrack: table full, dropping packet
[ 2992.936177] TCP: request_sock_TCP: Possible SYN flooding on port 8080. Sending cookies.  Check SNMP counters.
[ 2998.005969] net_ratelimit: 364 callbacks suppressed

That log also shows that we triggered the kernel's DOS protection against SYN flooding. We fixed that by increasing net.ipv4.tcp_max_syn_backlog and net.core.somaxconn.

Now when running tsung we got up to about 100k connections on the Haskell websocket box:

[root@websock-server:~]# netstat -ntp  | grep -v TIME_WAIT | wc
 119748  838238 12094489

Problem 2: The Erlang process limit

But then tsung's web UI would suddenly throw 500 errors and drop all connections. Initally we could not figure out what's going on because tsung is really slow at writing logs. Waiting for 5 minuts and then checking the logs reveals the message:

=ERROR REPORT==== 4-Nov-2015::18:03:45 ===
Too many processes

We noticed that tsung supports changing the maximum number of internal Erlang processes and we tried this:

tsung -p 1250000 -f tsung-conf.xml start

But no luck - the same problem occurs. Turns out that the -p switch doesn't actually work (we filed a bug).

We patched tsung ourselves for now.

Some performance numbers

So far we spent most of our time fighting tsung and the slighly bizarre Erlang ecosystem. Here's what 100k users look like for CPU and memory for the Haskell server:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1944 root      20   0 7210960 2.656g  22524 S 177.7 16.9   2:58.50 haskell-websock

2.6G, not bad! With all problems fixed we ran another test with 256k users:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2252 root      20   0 11.237g 4.714g  22532 S 128.3 30.1   6:58.25 haskell-websock

More addresses

In order to go higher we needed more IP addresses for tsung. This is where we learnt that EC2 limits the nunber of additional private IPs based on the instance type. You'll see a message like thisL

eni-5af8fa3d: Number of private addresses will exceed limit.

For m4.xlarge the limit is 15 addresses so we got another 6:

ip addr add 172.31.26.100/20  dev eth0
ip addr add 172.31.26.99/20  dev eth0
ip addr add 172.31.18.106/20  dev eth0
ip addr add 172.31.30.220/20  dev eth0
ip addr add 172.31.18.240/20  dev eth0
ip addr add 172.31.30.188/20  dev eth0

With 15 addresses in total we should get close to one million connections:

>>> 15 * 64000
960000

But tsung needs much more memory than our Haskell server and died at ~500k connections:

/run/current-system/sw/bin/tsung: line 60: 29721 Killed                  [...]

The Haskell server still running quite comfortably below 10G:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2320 root      20   0 16.879g 9.395g  22300 S   0.0 59.9  14:38.75 haskell-websock

That was certainly a fun afternoon! Time to clean up:

$ nixops destroy
warning: are you sure you want to destroy EC2 machine ‘tsung-1’? (y/N) y
warning: are you sure you want to destroy EC2 machine ‘websock-server’? (y/N) y

The whole experiment took ~2.5 hours and cost us a grand total of $0.25.

Graphs

Our graphs show very nicely that we add a bit more than 1000 connections a second, and that the connection count follows the user count closely. I.e. there is no delay from the Haskell server.

Some unscientific testing also showed that propagating a message to all 256k clients takes 10-50 milliseconds, so the 2 seconds quoted by the Phoenix team for 2 million users sound about right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blog.md

blog.md

Haskell can be fast, too

Running tsung

Problem 1: The firewall

Problem 2: The Erlang process limit

Some performance numbers

More addresses

Graphs

Files

blog.md

Latest commit

History

blog.md

File metadata and controls

Haskell can be fast, too

Running tsung

Problem 1: The firewall

Problem 2: The Erlang process limit

Some performance numbers

More addresses

Graphs