-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: implement ALP-RD compression (#947)
Fixes #10: Add ALP-RD compression. Currently our only floating point compression algorithm is standard ALP, which targets floats/doubles that are originally decimal, and thus have some natural integer they can round to when you undo the exponent. For science/math datasets, there are a lot of "real doubles", i.e. floating point numbers that use most/all of their available precision. These do not compress with standard ALP. The ALP paper authors had a solution for this called "ALP for 'Real' Doubles" / ALP-RD, which is implemented in this PR. ## Basics The key insight of ALP-RD is that even for dense floating point numbers, within a column they often share the front bits (exponent + first few bits of mantissa). We try and find the best cut-point within the leftmost 16-bits. There are generally a small number of unique values for the leftmost bits, so you can create a dictionary of fixed size (here we use the choice of 8 from the C++ implementation) which naturally bit-packs down to 3 bits. If you compress perfectly without exceptions, you can store 53 bits/value ~17% compression. In practice the amount varies. In the comments below you can see a test with the POI dataset referenced in the ALP paper, and we replicate their results of 55 and 56 bits/value respectively. ## List of changes * Reorganized the `vortex-alp` crate. I created two top-level modules, `alp` and alp_rd`, and moved the previous implementation into the `alp` module * Added new `ALPRDArray` in the `alp_rd` module. It supports both f32 and f64, and all major compute functions are implemented (save for `MaybeCompareFn` and the Accessors I will file an issue to implement these in a FLUP if alright, this PR is already quite large) * Added corresponding `ALPRDCompressor` and wired the CompressorRef everywhere I could find ALPCompressor * New benchmark for RD compression in the existing ALP benchmarks suite
- Loading branch information
Showing
24 changed files
with
1,101 additions
and
14 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.