Coalesce network writes #15

philhofer · 2015-01-14T03:51:06Z

Right now, every response packet is written to the wire with net.Conn.Write. In applications where GOMAXPROCS is more than about 2, contention (and subsequent scheduler overhead) eat up a lot of performance.

I'd like to see monotonically improving performance with greater GOMAXPROCS up to the number of physical cores on the machine. Right now, on my laptop, GOMAXPROCS=2 is significantly faster than GOMAXPROCS=4 for the TCP case, and marginally faster for the other cases.

The issue here is that net.Conn.Write is the only way to guarantee to the caller that the message was actually sent. We can write into a buffer, which would probably improve performance quite a bit, but then callers wouldn't necessarily have a guarantee that requests/responses were written to the wire. Of course, the network itself doesn't provide much in the way of guarantees as it is, so maybe that's not a problem.

The text was updated successfully, but these errors were encountered:

ttacon · 2015-01-14T14:08:52Z

It would at least be interesting to see what kind of performance increases we could get and also what "guarantee" issues crop up (if any).

philhofer · 2015-01-14T19:28:28Z

I have a very dirty WIP on https://github.com/tinylib/synapse/tree/coalesce

I implemented it on the server-side first, since it didn't change the semantics of ResponseWriter.Send().

Results:

With GOMAXPROCS=2

Test	Old Throughput	New Throughput
ClientPool (TCP)	106,000 /sec	143,000 /sec
TCPEcho	94,000 /sec	146,000 /sec
UnixNoop	299,000 /sec	407,000 /sec
PipeNoop	376,000 /sec	344,000 /sec

So, basically, it's about 300ns extra overhead in user-space, but far fewer system calls, so it ends up being faster for every case except for the user-space pipe.

I think implementing the same thing on the client side would yield additional improvements. However, buffering on the client side breaks UDP support, because each call to Write() is delivered in a separate packet, so packet divisions are non-deterministic.

I think it's reasonable to drop UDP support, given that this + #16 will provide basically the same kind of non-blocking network send.

philhofer · 2015-01-14T19:46:10Z

And with the addition of client side write-coalescing:

Test	Old Throughput	New Throughput
ClientPool	106,000 /sec	281,000 /sec
TCPEcho	94,000 /sec	442,500 /sec
UnixNoop	299,00 /sec	800,000 /sec
PipeNoop	376,000 /sec	893,000 /sec

ttacon · 2015-01-14T19:47:32Z

76TGHU78I?! that's pretty fast...

philhofer · 2015-01-14T19:54:11Z

Yeah. I can dig it.

philhofer added the performance label Jan 14, 2015

philhofer self-assigned this Jan 14, 2015

philhofer closed this as completed in 7cfe9e7 Jan 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coalesce network writes #15

Coalesce network writes #15

philhofer commented Jan 14, 2015

ttacon commented Jan 14, 2015

philhofer commented Jan 14, 2015

philhofer commented Jan 14, 2015

ttacon commented Jan 14, 2015

philhofer commented Jan 14, 2015

Coalesce network writes #15

Coalesce network writes #15

Comments

philhofer commented Jan 14, 2015

ttacon commented Jan 14, 2015

philhofer commented Jan 14, 2015

philhofer commented Jan 14, 2015

ttacon commented Jan 14, 2015

philhofer commented Jan 14, 2015