-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: improved objective function in sampling compressor #1000
Conversation
This reverts commit 8765d45.
Can you separate the last two as separate changes? |
note: as of commit 45560d4, we get 10-25% improved compression throughput with roughly equal file sizes
where baseline develop is against commit 4aa30c0 this is still using the MinSize strategy, albeit now estimating metadata size rather than just data buffer size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's couple weird artifacts
This reverts commit e110b26.
We should still get rid of the Compressor::only function. Once we have a use case happy to reconsider |
Fixes issue seen in #1723 In #1000, changes were made to remove ALP-RD as a top-level compressor and instead to only use it for patches. However, it seems that it was not getting selected anymore, whether that was due to the patching cost overhead or something else. This was noticed by a user, and confirmed by me in a Python shell. <img width="946" alt="image" src="https://github.com/user-attachments/assets/c42caed5-12e3-448b-aea6-3f33a7c97bfc" /> After this change, ALP-RD indeed does get selected again. <img width="907" alt="image" src="https://github.com/user-attachments/assets/bbb996fc-5223-43ed-9b4c-4b0262a417dc" />
This PR does several things:
But the headliner is that we now add a (configurable) overhead of 64 bytes per descendant array, which penalizes additional levels of cascading compression that don't do much.
The
compress_noci
benchmark shows ~10-30% higher compression throughput, and 10-70% higher decompression throughput, with file sizes +/- 3% compared todevelop
.