Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declare sorted columns in Arrow Schema to enable further optimizations #15

Open
lquerel opened this issue Aug 22, 2023 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@lquerel
Copy link
Contributor

lquerel commented Aug 22, 2023

At present, the sorted columns list for each Arrow record type is hardcoded. However, by designating this list as metadata within the Arrow Schema for each record, we pave the way for advanced optimizations.

For example, the default list of sorted columns may not always be ideal for optimizing compression ratios for specific tasks. By allowing for a dynamic column order based on entropy, we can potentially achieve improved compression. Integrating this list into the schema equips us with the information necessary to develop an adaptive receiver, ensuring accurate decoding of Arrow records.

@lquerel lquerel added the enhancement New feature or request label Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant