-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster ALP encode #924
faster ALP encode #924
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small nit
} | ||
|
||
// if there are no patches, we are done | ||
if chunk_patch_count == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to handle the edge case of 2 chunks where chunk 0 is all patches, chunk 1 has 0 patches... which won't fill
Realized that there's an unhandled edge case in #924, [commented here](https://github.com/spiraldb/vortex/pull/924/files#r1776099681) Essentially, on develop, if we have two chunks and the first chunk is all patches and the second chunk has 0 patches, then the patched values won't get filled in the encoded array. Not the end of the world (they're presumably full of integer approximations that don't round-trip), but if it's a case of outlier large values that are getting patched, then the encoded values will end up bitpacking poorly. This PR fixes that.
fixes #920
Consistently cuts encoding time by 10-50%.
Before the change:
After: