Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row Format Adapative Block Size #4812

Closed
tustvold opened this issue Sep 13, 2023 · 1 comment · Fixed by #4818
Closed

Row Format Adapative Block Size #4812

tustvold opened this issue Sep 13, 2023 · 1 comment · Fixed by #4818
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented Sep 13, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Currently the row format pads variable length payloads to 32 byte chunks. This is performant and easy to reason about, but is very inefficient for small strings.

Describe the solution you'd like

Instead of every block having the same size I would propose the first few blocks have a smaller size.

In particular I would propose that the first 4 blocks have a smaller block size of 8.

This would drastically reduce the space amplification for small strings, reducing memory usage and potentially yielding faster comparisons

Describe alternatives you've considered

Additional context

#4811 proposes removing the dictionary interning which would likely make this optimisation more important

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label Sep 13, 2023
@tustvold tustvold self-assigned this Sep 13, 2023
tustvold added a commit to tustvold/arrow-rs that referenced this issue Sep 14, 2023
tustvold added a commit that referenced this issue Sep 17, 2023
* Adaptive Row Block Size (#4812)

* Perf improvements

* Further tweaks

* Review feedback
@tustvold tustvold added the arrow Changes to the arrow crate label Sep 18, 2023
@tustvold
Copy link
Contributor Author

label_issue.py automatically added labels {'arrow'} from #4818

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant