-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
safekeeper: batch AppendRequest writes #9744
base: main
Are you sure you want to change the base?
Conversation
Early benchmarks on a MacBook show that 1 KB writes are 700% faster with fsync, 1900% without fsync. However, there are regressions at higher write sizes (in particular the 50% regression with
|
@arssher As discussed over in #9694, here's a prototype of WAL acceptor batching on the Safekeeper side. You mentioned that we also do some batching on the compute side. It doesn't really matter which side we do the batching on, as long as we do it. Let's find some appropriate workloads to run some end-to-end benchmarks with. For now, I'll try out a |
You were right, these do get batched into 128 KB messages. I'm seeing throughput cap out at about 300 MB/s though, without fsync, when the Safekeeper can do about 2 GB/s. I'll investigate this further on #9642, since it's not batching that's holding it back. |
5391 tests run: 5171 passed, 0 failed, 220 skipped (full report)Test coverage report is not availableThe comment gets automatically updated with the latest test results
4308ffe at 2024-11-13T15:40:42.508Z :recycle: |
So looking at last comments in #9642 this we can pause this for now, right? |
Yes, let's pause this for now. |
Problem
Safekeeper WAL ingest performance is very poor with many small appends (e.g. 1 KB). This is mainly because Tokio file IO is slow: every write incurs a Tokio task spawn and thread context switch.
Resolves #9689.
Summary of changes
Buffer AppendRequests and submit them as 1 MB writes as long as there are queued messages.
The queue size is also increased from 256 to 4096. This was necessary to improve throughput, otherwise the queue is quickly drained and written before the sender gets scheduled and can repopulate the queue. This needs further tuning with end-to-end benchmarks and appropriate workloads.