Skip to content
This repository has been archived by the owner on May 29, 2020. It is now read-only.

Coalesce network writes #15

Closed
philhofer opened this issue Jan 14, 2015 · 5 comments
Closed

Coalesce network writes #15

philhofer opened this issue Jan 14, 2015 · 5 comments
Assignees

Comments

@philhofer
Copy link
Member

Right now, every response packet is written to the wire with net.Conn.Write. In applications where GOMAXPROCS is more than about 2, contention (and subsequent scheduler overhead) eat up a lot of performance.

I'd like to see monotonically improving performance with greater GOMAXPROCS up to the number of physical cores on the machine. Right now, on my laptop, GOMAXPROCS=2 is significantly faster than GOMAXPROCS=4 for the TCP case, and marginally faster for the other cases.

The issue here is that net.Conn.Write is the only way to guarantee to the caller that the message was actually sent. We can write into a buffer, which would probably improve performance quite a bit, but then callers wouldn't necessarily have a guarantee that requests/responses were written to the wire. Of course, the network itself doesn't provide much in the way of guarantees as it is, so maybe that's not a problem.

@ttacon
Copy link
Collaborator

ttacon commented Jan 14, 2015

It would at least be interesting to see what kind of performance increases we could get and also what "guarantee" issues crop up (if any).

@philhofer
Copy link
Member Author

I have a very dirty WIP on https://github.com/tinylib/synapse/tree/coalesce

I implemented it on the server-side first, since it didn't change the semantics of ResponseWriter.Send().

Results:

With GOMAXPROCS=2

Test Old Throughput New Throughput
ClientPool (TCP) 106,000 /sec 143,000 /sec
TCPEcho 94,000 /sec 146,000 /sec
UnixNoop 299,000 /sec 407,000 /sec
PipeNoop 376,000 /sec 344,000 /sec

So, basically, it's about 300ns extra overhead in user-space, but far fewer system calls, so it ends up being faster for every case except for the user-space pipe.

I think implementing the same thing on the client side would yield additional improvements. However, buffering on the client side breaks UDP support, because each call to Write() is delivered in a separate packet, so packet divisions are non-deterministic.

I think it's reasonable to drop UDP support, given that this + #16 will provide basically the same kind of non-blocking network send.

@philhofer
Copy link
Member Author

And with the addition of client side write-coalescing:

Test Old Throughput New Throughput
ClientPool 106,000 /sec 281,000 /sec
TCPEcho 94,000 /sec 442,500 /sec
UnixNoop 299,00 /sec 800,000 /sec
PipeNoop 376,000 /sec 893,000 /sec

@ttacon
Copy link
Collaborator

ttacon commented Jan 14, 2015

76TGHU78I?! that's pretty fast...

@philhofer
Copy link
Member Author

Yeah. I can dig it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants