Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer compression dev branch #7266

Closed
wants to merge 338 commits into from
Closed

Conversation

nicola-cab
Copy link
Member

@nicola-cab nicola-cab commented Jan 18, 2024

What, How & Why?

Contains the current state-of-the-art for compressing integers.
PR is composed by:

  1. Array classification, marks the arrays to compress while the cluster tree is traversed (Mixed are disabled for now, because support for nested collections is needed, but basic mixed already work). Compression happens only during committing, the array is eventually decompressed only during copy on write (when it is changed, eg a new insertion occurred).
  2. ArrayPacked format, saves space storing the same integers continuously using fewer bits if possible
  3. ArrayFlex format handles duplicates and save space storing unique values only and a list of indices

Compression wise, the gain obtained is in line with what expected. We used clickbench.cpp utility for verifying this.

nicola-cab and others added 30 commits November 23, 2023 17:48
* inline getters and main encoding functions

* start experimenting creating only one iterator in packed, mixed == better

* create iterator only once and move it to needed index

* operator* for bf_iterator + minor changes
@nicola-cab nicola-cab changed the base branch from next-major to master April 8, 2024 15:54
@nicola-cab nicola-cab force-pushed the nc/merge_all_together branch from cc1894c to 923e2d5 Compare April 9, 2024 17:09
@nicola-cab nicola-cab changed the base branch from master to next-major April 9, 2024 17:17
@nicola-cab nicola-cab force-pushed the nc/merge_all_together branch from 923e2d5 to 37e816c Compare April 9, 2024 17:27
@nicola-cab nicola-cab force-pushed the nc/merge_all_together branch from 37e816c to a067bd4 Compare April 9, 2024 17:33
@nicola-cab nicola-cab force-pushed the nc/merge_all_together branch from 2a6d5f8 to b43188a Compare April 23, 2024 16:16
* code review

* fix conflicts

* code review

* code review

* code review

* some cleanup

* restore heuristic packed

* more cleanup

* make init array lighter and test query engine arm

* test flex compression

* confirm heuristic for flex + prune okish benchmarks

* simplify getters for compressed int interners

* specialised find for compressed arrays of int mixed types

* do not compress composite array for mixed

* added the possibility to fetch multiple values in a range b,e for compressed array

* restore all benchmarks

* fetch_all values with unaligned iterator for get_all

* better heuristics in packed

* fix value fetching

* faster decompression

* better heuristic for flex

* fix build windows x86

* fix asan correct report of accessing too much memory passed the last word

* use unaligned_iterator::get for getAll

* fix get_all

* code review

* lint

* fix upper bound for compressed arrays

* reverse logic branch hint

* revert branch hint
@nicola-cab nicola-cab changed the base branch from next-major to nc/rcore-2057 May 1, 2024 16:50
@nicola-cab
Copy link
Member Author

closing this pull request, since it has been split in different PRs targetting next-major.

@nicola-cab nicola-cab closed this May 1, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 31, 2024
@nicola-cab nicola-cab changed the title RCORE-1624 Integer compression Integer compression dev branch Jun 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants