-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanos compact leaves partial blocks on S3 when doing deletes #7840
Comments
Hey there, @grimz-ly! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together! I found a similar closed issue that might be relevant to your situation: compact: Thanos Compactor doesn't delete blocks which are marked for deletion. The issue was resolved by setting a lower value for the Thanos determines which blocks are considered 'partial' and should be deleted based on the absence of a fully loadable Additionally, Thanos provides logging and metrics related to the deletion of partial blocks. The You might want to check your configuration for the |
@dosu This is not the case. As I mentioned, the block files are getting deleted once --delete-delay time is reached. It's just leaving the block folders behind. Here are the command options being used in our docker compose service config
|
The issue you're experiencing, where block folders are left behind after the block files are deleted, might be related to how Thanos handles the deletion process. Thanos uses the If the block files are being deleted but the folders remain, it could be due to a specific condition or error in the deletion process that prevents the Additionally, ensure that there are no permissions or configuration issues with your S3 bucket that might prevent the deletion of empty directories. If everything seems correct and the issue persists, it might be worth investigating further into the specific implementation of the |
Some additional information. I copied a block marked for deletion to test with. A recursive rm does delete all of the objects with /<block_id> as well as from /<block_id>/chunks But this leaves the 'folders' behind. Going back w/ just rm command will clean these up as I noted above. Compact sees these as partial blocks and it's rather annoying seeing the output in the docker logs on the 5m cleanup intervals when the count gets large. |
Thanos, Prometheus and Golang version used:
Thanos: v0.36.1
Prometheus: v2.54.1 (using sidecar)
Golang: go1.22.6
Object Storage Provider: S3
We've been leveraging Thanos for our long term metrics for about 13 months now. One thing I've noticed, but never addressed, is the partial blocks count constantly increasing as time goes on. I didn't notice any major upload issues from our sidecar instances but figured out it was deleted blocks. Every block deleted has its 'folder' (ie 01JA5SKZ82CJNVMHAK6ZR9RE6D) and a 'subfolder' chunks left behind.
No obvious issues w/ compaction, downsampling, or retention. Here is a current log showing such
compactor | ts=2024-10-17T17:40:47.800815487Z caller=fetcher.go:623 level=info component=block.BaseFetcher msg="successfully synchronized block metadata" duration=7.532362539s duration_ms=7532 cached=758 returned=758 partial=24872
What you expected to happen:
I'd expect Thanos also remove these directory objects so it doesn't output thousands of info files in docker logs like this every 5m
compactor | ts=2024-10-17T17:45:47.116627349Z caller=clean.go:59 level=info msg="deleted aborted partial upload" block=01J9T5X3SP0ANNJH80RK7J1DQ2 thresholdAge=48h0m0s
compactor | ts=2024-10-17T17:45:47.116681196Z caller=clean.go:49 level=info msg="found partially uploaded block; marking for deletion" block=01J9KQYSSANGK37W371JWDZKTR
compactor | ts=2024-10-17T17:45:47.121624247Z caller=clean.go:59 level=info msg="deleted aborted partial upload" block=01J9KQYSSANGK37W371JWDZKTR thresholdAge=48h0m0s
compactor | ts=2024-10-17T17:45:47.121679123Z caller=clean.go:49 level=info msg="found partially uploaded block; marking for deletion" block=01J9W3BY9STJXSX54TTX273NXM
compactor | ts=2024-10-17T17:45:47.126779616Z caller=clean.go:59 level=info msg="deleted aborted partial upload" block=01J9W3BY9STJXSX54TTX273NXM thresholdAge=48h0m0s
compactor | ts=2024-10-17T17:45:47.126802449Z caller=clean.go:49 level=info msg="found partially uploaded block; marking for deletion" block=01J9YGRC91YCZES4JV4758NAMF
Additional Notes:
We send metrics to two primary S3 buckets. I've cleaned one of the two using MinIO client by
for i in
cat /tmp/bucket-ls; do docker-compose exec mc mc rm thanos/store-metrics/$i; done
This effectively removes the empties and leaves those with any metadata in them because a basic rm cannot delete anything with metadata in it.
Is this normal behavior? If so, could an extra step be added to the delete code to perform another rm on the empty block dir object?
The text was updated successfully, but these errors were encountered: