Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to completely remove overshadowed segments under var/druid/segments/ ? #17486

Open
nibinqtl opened this issue Nov 18, 2024 · 3 comments
Open

Comments

@nibinqtl
Copy link

Affected Version

I'm running the currently latest 31.0.0

Description

  • Cluster size: single-server setup
  • Configurations in use: small

I repeatedly use the "local input source" to ingest records in json format at about 10,000 rows every 20 seconds.
Each ingest end up as a segment.
Then I use auto-compaction to combine them into larger segments of about 3M rows. It worked really well as shown on the console. In the segments list, I can only see the compacted large segments and the recent small segments not yet compacted.

However, I noticed that the old (overshadowed) small segments are still in the storage taking disk space. They are at:
var/druid/segments/
none of them are actually removed.
The directory var/druid/segments-cache/ seems to contain only the active segments and its size matches the size of the data source shown on the console.

How can I configure druid to actually remove those old unused overshadowed segments from disk?

@nibinqtl
Copy link
Author

I found this API endpoint achieved exactly what I want:
https://druid.apache.org/docs/31.0.0/api-reference/data-management-api#permanently-delete-segments
It reduced the size of var/druid/segments/ down to 5% and var/druid/segments-cache/ is untouched.
According to the document, this API should only delete "unused segments". I noticed that most compacted segments var/druid/segments/ also got deleted. Maybe it is because there is another copy under var/druid/segments-cache/?

@kfaraz
Copy link
Contributor

kfaraz commented Nov 20, 2024

@nibinqtl , typically, you shouldn't need to call the API explicitly.
You could enable kill of unused segments by setting druid.coordinator.kill.on=true in the runtime.properties of the coordinator service.

@nibinqtl
Copy link
Author

nibinqtl commented Nov 21, 2024

@kfaraz Thanks, but I already have

druid.coordinator.kill.on=true
druid.coordinator.kill.period=PT12H
druid.coordinator.kill.durationToRetain=P2D
druid.coordinator.kill.bufferPeriod=P1D
druid.coordinator.kill.maxSegments=10000

in the configuration, which does not seems to do anything.
I waited a few days and nothing happened to var/druid/segments/
Any suggetions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants