-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Java: the misses in rocksdbjni are very expensive #13023
Comments
Hello @trazfr, thank you for amazing report. I agree that using exceptions for misses is probably not the best. I will check with my colleagues and see how we can optimize this. Probably leave exception on for error cases. Radek cc: @adamretter |
Thanks @rhubner, it looks like this try/catch approach in the C++ code was originally pioneered by @alanpaxton (see 5a063ec). @alanpaxton Would you be able to take a look at this please? I think a good first step would be to construct some JMH Benchmarks outside of RocksDB in our JNI Benchmarks project to look at the different approaches we could use here and their performance, before we decide how to move forward here. What do you think? |
@trazfr from your report and attached images I can't seem to ascertain the numbers for the performance difference. Would you be able to tell me a bit more in detail about the performance difference please? |
👋 Yes, no problem. Basically the first thing we can see is the difference of time spent between the 1M cache misses and 1M cache hits. $ git clone https://github.com/trazfr/rocksdbjni-miss-perfs.git
$ cd rocksdbjni-miss-perfs
# The datadog agent will fail to POST to localhost, but this does not change the run.
$ ./run.sh --api-key xxx --site localhost Running on AWS $ docker logs rocksdbjni-miss-perfs-test-rocksdb-vanilla-1 --tail 11
Read the database with 1000000 keys that are in the database
Time: 3312.024621ms
Query the database with 1000000 keys that are NOT in the database
Time: 5206.450207ms
Read the database with 1000000 keys that are in the database
Time: 3301.095231ms
Query the database with 1000000 keys that are NOT in the database
Time: 5174.598296ms
Read the database with 1000000 keys that are in the database
Time: 3309.393501ms This is not architecture-specific, as running on Mac M1 with $ docker logs rocksdbjni-miss-perfs-test-rocksdb-vanilla-1 --tail 11
Read the database with 1000000 keys that are in the database
Time: 2531.553673ms
Query the database with 1000000 keys that are NOT in the database
Time: 4084.447483ms
Read the database with 1000000 keys that are in the database
Time: 2523.187492ms
Query the database with 1000000 keys that are NOT in the database
Time: 4069.455081ms
Read the database with 1000000 keys that are in the database
Time: 2513.968017ms Note: the keys to perform hit and miss are interleaved to try to compare them:
|
Please find attached one of the jfr which were used to generate the screenshots above. For instance under JDK Mission Control: Event Browser -> Datadog -> Profiling -> Method CPU Profiling Sample, we get the same kind of flame graph on 1 minute: |
Performance of NotFound case has been bad - see facebook#13023 Some jmh tests to measure this, prior to fixing it.
👋 Thanks a lot @alanpaxton, I have just tested the PR #13049 I have updated my small repo to test it: https://github.com/trazfr/rocksdbjni-miss-perfs This is fixing this issue. The performances of misses are about the same as hits (and even a bit faster). Datadog profile under an AWS Example of JFR: main.jfr.zip |
Hi @trazfr great news, thanks - that more than confirms the results I was seeing from some JMH tests I added in the PR - hopefully once a colleague has reviewed it, the RocksDB core team can help us to get this merged for you soon. |
Performance of NotFound case has been bad - see facebook#13023 Some jmh tests to measure this, prior to fixing it.
Performance of NotFound case has been bad - see facebook#13023 Some jmh tests to measure this, prior to fixing it.
Performance of NotFound case has been bad - see facebook#13023 Some jmh tests to measure this, prior to fixing it.
Performance of NotFound case has been bad - see facebook#13023 Some jmh tests to measure this, prior to fixing it.
Expected behavior
This is only for rocksdbjni.
In rocksdbjni, I would expect the misses to be less expensive than the hits, or at least as expensive.
I have tested
v9.5.2
, but the lastv9.6.1
has no change injava/rocksjni/rocksjni.cc
so the issue remains.Actual behavior
The misses are far more expensive than the hits due to a C++ exception which is thrown then caught a few lines after.
For instance in
java/rocksjni/rocksjni.cc
in Java_org_rocksdb_RocksDB_get__J_3BII():Just changing the above code to the following speeds up a lot the misses:
Steps to reproduce the behavior
I have created a small reproducer in trazfr/rocksdbjni-miss-perfs which runs the two following rocksdbjni jar and send the results to the Datadog profiler:
9.5.2
In this reproducer, we create a DB with 1M entries, then we perform:
App.readDatabaseHits()
)App.readDatabaseMisses()
)I don't compare vanilla vs patched as the compilation parameters are different, but only the CPU time spent on hits vs misses in each case. The following tests are done on an AWS
m5d.4xlarge
machine (linux/x84_64), but I have also been able to reproduce an a M1 Mac (MacOS/aarch64 and qemu+linux/aarch64).On the vanilla version, in case of miss, we can see that a lot of CPU time is spent in
rocksdb::KVException::ThrowOnError()
. This makes the misses slower than the hits by far:With the small patch above, as we don't throw exceptions, the misses are a bit less expensive than the hits:
The text was updated successfully, but these errors were encountered: