Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compaction not running or running very slowly when entire db is deleted? #13120

Closed
zaidoon1 opened this issue Nov 6, 2024 · 6 comments
Closed

Comments

@zaidoon1
Copy link
Contributor

zaidoon1 commented Nov 6, 2024

so for my service, there are times when something happens that causes pretty much the entire db to be deleted using individual delete requests. I'm aware that sending individual delete requests to delete the entire db (instead of a range delete for example) will fill the db with tombstones and pretty much degrade the performance of rocksdb severely as we can see below. What I don't understand is why does it take days for compaction to run/finish compacting all of the tombstones away. As we can see there is not much activity and compaction is not running, I've seen this issue happen in the past and it took a few days for rocksdb to recover/go back to normal after an event like this (deleting pretty much the entire db)

Screenshot 2024-11-06 at 1 41 02 PM Screenshot 2024-11-06 at 1 39 40 PM Screenshot 2024-11-06 at 1 39 15 PM Screenshot 2024-11-06 at 1 38 30 PM

rocksdb config:

OPTIONS-000007.txt

I'm running latest rocksdb version, pretty much default settings, however, I do have db ttl set to a few hours which is more confusing since the docs say:

Leveled: Non-bottom-level files with all keys older than TTL will go through the compaction process. This usually happens in a cascading way so that those entries will be compacted to bottommost level/file. The feature is used to remove stale entries that have been deleted or updated from the file system.

My understanding is that after the entire db is deleted, once the ttl is up, compaction would run and compact the entire db to get rocksdb back to a normal state.

@cbi42
Copy link
Member

cbi42 commented Nov 7, 2024

there is not much activity and compaction is not running

After the entire DB is deleted, is there any more writes to the DB? RocksDB doesn't have a timer-based trigger to check if there's eligible compactions. Usually it tries to schedule a compaction after a flush/compaction. I suspect that there's nothing to trigger compaction to be scheduled. Can you do a manual compaction after you issue all the deletions?

@zaidoon1
Copy link
Contributor Author

zaidoon1 commented Nov 8, 2024

After the entire DB is deleted, is there any more writes to the DB?

there is writes, but my workload is extremely read heavy. The other problem is that the service started using 10 cpus and that is the max number of cpus allocated for the service using cgroups so it started being throttled and made things worse.

A few questions:

  1. would CompactOnDeletionCollector help here?
  2. we can see the number of sst files go down, this means compaction did happen right? Note that this metric is powered by me getting the number of "live files"
  3. I feel like running manual compaction wouldn't work here if the moment the db size went down after all the deletes, rocksdb started using all cpus, started getting throttled and pretty much deadlocked?

another data point, the last time this happened, a simple db restart fixed the issue and everything went back to normal

@cbi42
Copy link
Member

cbi42 commented Nov 8, 2024

a simple db restart fixed the issue and everything went back to normal

Do you hold snapshot for a very long time? Restart will clear the snapshots. Tombstone can be in the last level files if there's a snapshot preventing them to be dropped.

would CompactOnDeletionCollector help here?

If the problem is that files with many tombstones are not being compacted down to the last level, then yes. It won't help if there's snapshot keeping tombstones ailve.

I feel like running manual compaction wouldn't work here if the moment the db size went down after all the deletes, rocksdb started using all cpus, started getting throttled and pretty much deadlocked?

The hope is to compact away the tombstones with manual compaction so that iterators won't use this much CPU.

@zaidoon1
Copy link
Contributor Author

zaidoon1 commented Nov 8, 2024

Do you hold snapshot for a very long time?

my delete workflow is as follows:

  1. create a checkpoint
  2. have some separate service/process open the checkpoint (read only mode)
  3. iterate over all the kvs, check if they are stale, if they are stale, send a delete request to the service that has rocksdb open in read/write mode and move to the next kv.
  4. delete checkpoint directory

Is checkpoint and snapshot the same thing here?

Given that, does rocksdb consider the checkpoint to be "held" the entire time?

Should I update my clean up service to do something like:

  1. get estimate of number of kvs in the db
  2. iterate over all the kvs, check if they are stale, if they are stale, send a delete request to the service that has rocksdb open in read/write mode and move to the next kv.
  3. if ratio of deleted keys to estimated number of kvs is X, then stop trying to clean up.
  4. delete checkpoint directory
  5. trigger manual compaction
  6. sleep for x minutes
  7. create checkpoint again, and start processing again

@cbi42
Copy link
Member

cbi42 commented Nov 21, 2024

Checkpoint is different from snapshot, which can be created by GetSnapshot(). Manual compaction should help.

To figure out why ttl compaction was not run, you can dump the SST files ./sst_dump --command=raw --show_properties --file=/... and check for this table property:

// Oldest ancester time. 0 means unknown.

@zaidoon1
Copy link
Contributor Author

Manual compaction should help.

got it, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants