Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster ALP encode #924

Merged
merged 17 commits into from
Sep 25, 2024
Merged

faster ALP encode #924

merged 17 commits into from
Sep 25, 2024

Conversation

lwwmanning
Copy link
Member

@lwwmanning lwwmanning commented Sep 25, 2024

fixes #920

Consistently cuts encoding time by 10-50%.

Before the change:

Running benches/alp_compress.rs (target/release/deps/alp_compress-abbdaefc5eabf343)
Timer precision: 41 ns
alp_compress          fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ alp_compress                     │               │               │               │         │
│  ├─ f32                           │               │               │               │         │
│  │  ├─ 100000       191.9 µs      │ 824.9 µs      │ 314.7 µs      │ 354 µs        │ 100     │ 100
│  │  ╰─ 10000000     21.39 ms      │ 28.95 ms      │ 21.71 ms      │ 21.89 ms      │ 100     │ 100
│  ╰─ f64                           │               │               │               │         │
│     ├─ 100000       236 µs        │ 353.7 µs      │ 238.4 µs      │ 246.4 µs      │ 100     │ 100
│     ╰─ 10000000     28.78 ms      │ 68.68 ms      │ 29.49 ms      │ 29.93 ms      │ 100     │ 100

After:

Running benches/alp_compress.rs (target/release/deps/alp_compress-abbdaefc5eabf343)
Timer precision: 41 ns
alp_compress          fastest       │ slowest       │ median        │ mean          │ samples │ iters
├─ alp_compress                     │               │               │               │         │
│  ├─ f32                           │               │               │               │         │
│  │  ├─ 100000       161 µs        │ 234.6 µs      │ 163.3 µs      │ 166 µs        │ 100     │ 100
│  │  ╰─ 10000000     18.72 ms      │ 21.54 ms      │ 19.07 ms      │ 19.14 ms      │ 100     │ 100
│  ╰─ f64                           │               │               │               │         │
│     ├─ 100000       182 µs        │ 346 µs        │ 183.9 µs      │ 187.9 µs      │ 100     │ 100
│     ╰─ 10000000     23.98 ms      │ 28.71 ms      │ 24.52 ms      │ 24.53 ms      │ 100     │ 100

@lwwmanning lwwmanning changed the title branchless ALP encode faster ALP encode Sep 25, 2024
@lwwmanning lwwmanning marked this pull request as ready for review September 25, 2024 14:35
encodings/alp/src/alp.rs Outdated Show resolved Hide resolved
encodings/alp/src/alp.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@a10y a10y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit

encodings/alp/src/alp.rs Show resolved Hide resolved
@lwwmanning lwwmanning enabled auto-merge (squash) September 25, 2024 15:04
@lwwmanning lwwmanning merged commit a7fd730 into develop Sep 25, 2024
5 checks passed
@lwwmanning lwwmanning deleted the wm/branchless-alp branch September 25, 2024 15:17
}

// if there are no patches, we are done
if chunk_patch_count == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle the edge case of 2 chunks where chunk 0 is all patches, chunk 1 has 0 patches... which won't fill

lwwmanning added a commit that referenced this pull request Sep 26, 2024
Realized that there's an unhandled edge case in #924, [commented
here](https://github.com/spiraldb/vortex/pull/924/files#r1776099681)

Essentially, on develop, if we have two chunks and the first chunk is
all patches and the second chunk has 0 patches, then the patched values
won't get filled in the encoded array. Not the end of the world (they're
presumably full of integer approximations that don't round-trip), but if
it's a case of outlier large values that are getting patched, then the
encoded values will end up bitpacking poorly.

This PR fixes that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ALP has a lot of branching
4 participants