-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer compression dev branch #7266
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…m-core into nc/new_layout_local_directory
…m-core into nc/new_layout_local_directory
…m-core into nc/new_layout_local_directory
…m-core into nc/new_layout_local_directory
…m-core into nc/new_layout_local_directory
* inline getters and main encoding functions * start experimenting creating only one iterator in packed, mixed == better * create iterator only once and move it to needed index * operator* for bf_iterator + minor changes
nicola-cab
force-pushed
the
nc/merge_all_together
branch
from
April 9, 2024 17:09
cc1894c
to
923e2d5
Compare
nicola-cab
force-pushed
the
nc/merge_all_together
branch
from
April 9, 2024 17:27
923e2d5
to
37e816c
Compare
nicola-cab
force-pushed
the
nc/merge_all_together
branch
from
April 9, 2024 17:33
37e816c
to
a067bd4
Compare
nicola-cab
force-pushed
the
nc/merge_all_together
branch
from
April 23, 2024 16:16
2a6d5f8
to
b43188a
Compare
* code review * fix conflicts * code review * code review * code review * some cleanup * restore heuristic packed * more cleanup * make init array lighter and test query engine arm * test flex compression * confirm heuristic for flex + prune okish benchmarks * simplify getters for compressed int interners * specialised find for compressed arrays of int mixed types * do not compress composite array for mixed * added the possibility to fetch multiple values in a range b,e for compressed array * restore all benchmarks * fetch_all values with unaligned iterator for get_all * better heuristics in packed * fix value fetching * faster decompression * better heuristic for flex * fix build windows x86 * fix asan correct report of accessing too much memory passed the last word * use unaligned_iterator::get for getAll * fix get_all * code review * lint * fix upper bound for compressed arrays * reverse logic branch hint * revert branch hint
closing this pull request, since it has been split in different PRs targetting next-major. |
nicola-cab
changed the title
RCORE-1624 Integer compression
Integer compression dev branch
Jun 6, 2024
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What, How & Why?
Contains the current state-of-the-art for compressing integers.
PR is composed by:
Array classification
, marks the arrays to compress while the cluster tree is traversed (Mixed are disabled for now, because support for nested collections is needed, but basic mixed already work). Compression happens only during committing, the array is eventually decompressed only during copy on write (when it is changed, eg a new insertion occurred).ArrayPacked
format, saves space storing the same integers continuously using fewer bits if possibleArrayFlex
format handles duplicates and save space storing unique values only and a list of indicesCompression wise, the gain obtained is in line with what expected. We used
clickbench.cpp
utility for verifying this.