-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize RocksDB DB open options #3316
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
afb763d
to
265f3cb
Compare
06a6062
to
1cef33d
Compare
This was referenced Feb 13, 2025
afck
approved these changes
Feb 13, 2025
@@ -47,6 +47,9 @@ const MAX_VALUE_SIZE: usize = 3221225072; | |||
// 8388608 and so for offset reason we decrease by 400 | |||
const MAX_KEY_SIZE: usize = 8388208; | |||
|
|||
const DB_CACHE_SIZE: usize = 8 * 1024 * 1024; // 8 mb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
const DB_CACHE_SIZE: usize = 8 * 1024 * 1024; // 8 mb | |
const DB_CACHE_SIZE: usize = 8 * 1024 * 1024; // 8 MiB |
@@ -288,6 +291,12 @@ impl RocksDbStoreInternal { | |||
} | |||
let mut options = rocksdb::Options::default(); | |||
options.create_if_missing(true); | |||
options.create_missing_column_families(true); | |||
// Flush in memory buffer to disk more often |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
// Flush in memory buffer to disk more often | |
// Flush in-memory buffer to disk more often. |
265f3cb
to
8493870
Compare
1cef33d
to
4296c29
Compare
4296c29
to
d1a5062
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
RocksDB is implemented as an SST. So it has some in memory buffer, that gets flushed to disk when certain criteria is met.
What was happening was that we weren't flushing data to disk frequently enough, and the implementation we're using to get an iterator in
find_keys_by_prefix
flushes some tombstones to disk, if necessary. And since we weren't flushing data to disk frequently enough, it often was.This was causing an increase in those reads of a few tens of ms, which when trying to make the client achieve high TPS makes a big difference.
Proposal
Reduce the size of the write buffer in general, so that we flush to disk more often. This overall seems to improve performance.
Based on this https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#flushing-options
We had at most 2 write buffers before of 64 mb each, which is the default settings. Now we have at most 16 write buffers with 8 mb each. Memory usage should stay the same, but disk flushing will be more frequent.
I also changed the compression type to
Lz4
, which should be slightly faster than the default,Snappy
.Test Plan
Ran benchmark locally, time it takes to reach a certain BPS isn't increasing with time anymore. Ran it for almost 4 hours, it stays within the same range of 400-700ms
Release Plan