Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ChunkedCompressor which compresses chunk n+1 like chunk n #996
feat: add ChunkedCompressor which compresses chunk n+1 like chunk n #996
Changes from all commits
9ce1789
336888f
f73df06
33ff0aa
fe2db95
3cb83f1
a0af4cd
75c03dc
7734c17
8bcc45f
4b8266c
8c4312b
c211061
e9b6da1
629ea6c
6a40f1c
1a5489b
8abdca9
2065033
799dd6a
bcb82d5
25a387c
fb9b804
4326be9
dc51115
5a1be70
fe3ce9d
8bde547
9b8859e
7274215
deb4287
da8d39c
c11620e
2be867e
240693b
d325968
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if child is None, should this just be empty...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is the diff. I don't find either of these particularly palatable but I think the fix is to sort out how compressors pass information from one invocation of compress to the next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why my changes encountered this, but I was getting samples that were hundreds of bytes too small for FSST which triggered this PR to compress poorly.
We may want to think more broadly about how to estimate FSST compression ratio on a tiny sample, but for now this seems reasonable unless we're directly calling
compress
on a small array (rather than a sample thereof).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a10y any guidance on how you arrived at 10x multiplier or just a guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I had chatted 1:1 with Dan about this, it was fairly arbitrary