Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: improved objective function in sampling compressor #1000

Merged
merged 42 commits into from
Oct 10, 2024

Conversation

lwwmanning
Copy link
Member

@lwwmanning lwwmanning commented Oct 8, 2024

This PR does several things:

  • adds a ton of benchmarks for decompression throughput
  • don't allow the patches from bitpacking to be bitpacked with patches
  • ALPRD is only called by ALP

But the headliner is that we now add a (configurable) overhead of 64 bytes per descendant array, which penalizes additional levels of cascading compression that don't do much.

The compress_noci benchmark shows ~10-30% higher compression throughput, and 10-70% higher decompression throughput, with file sizes +/- 3% compared to develop.

@robert3005
Copy link
Member

Can you separate the last two as separate changes?

@lwwmanning
Copy link
Member Author

lwwmanning commented Oct 9, 2024

note: as of commit 45560d4, we get 10-25% improved compression throughput with roughly equal file sizes

BENCH_VORTEX_RATIOS='.*' cargo bench --bench compress_noci -- --baseline develop --test

where baseline develop is against commit 4aa30c0

this is still using the MinSize strategy, albeit now estimating metadata size rather than just data buffer size.

@lwwmanning lwwmanning changed the title [wip] feat: ScanPerf objective function in sampling compressor feat: improved objective function in sampling compressor Oct 9, 2024
@lwwmanning lwwmanning marked this pull request as ready for review October 9, 2024 22:15
@lwwmanning lwwmanning enabled auto-merge (squash) October 9, 2024 22:21
Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's couple weird artifacts

pyvortex/src/array.rs Show resolved Hide resolved
vortex-sampling-compressor/src/lib.rs Outdated Show resolved Hide resolved
vortex-sampling-compressor/src/constants.rs Outdated Show resolved Hide resolved
bench-vortex/benches/compress_noci.rs Outdated Show resolved Hide resolved
bench-vortex/benches/compress_noci.rs Outdated Show resolved Hide resolved
encodings/dict/benches/dict_compress.rs Outdated Show resolved Hide resolved
encodings/dict/benches/dict_compress.rs Outdated Show resolved Hide resolved
bench-vortex/benches/compressor_throughput.rs Outdated Show resolved Hide resolved
@lwwmanning lwwmanning requested a review from robert3005 October 10, 2024 15:35
@robert3005
Copy link
Member

We should still get rid of the Compressor::only function. Once we have a use case happy to reconsider

@lwwmanning lwwmanning merged commit 2e227c3 into develop Oct 10, 2024
5 checks passed
@lwwmanning lwwmanning deleted the wm/objective branch October 10, 2024 16:58
a10y added a commit that referenced this pull request Dec 19, 2024
Fixes issue seen in #1723

In #1000, changes were made to remove ALP-RD as a top-level compressor
and instead to only use it for patches. However, it seems that it was
not getting selected anymore, whether that was due to the patching cost
overhead or something else. This was noticed by a user, and confirmed by
me in a Python shell.

<img width="946" alt="image"
src="https://github.com/user-attachments/assets/c42caed5-12e3-448b-aea6-3f33a7c97bfc"
/>

After this change, ALP-RD indeed does get selected again.

<img width="907" alt="image"
src="https://github.com/user-attachments/assets/bbb996fc-5223-43ed-9b4c-4b0262a417dc"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants