-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: canonicalization of chunked ExtensionArray #499
Conversation
991a48e
to
ef6e6d9
Compare
@@ -28,6 +28,10 @@ pub(crate) fn try_canonicalize_chunks( | |||
chunks: Vec<Array>, | |||
dtype: &DType, | |||
) -> VortexResult<Canonical> { | |||
if chunks.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a FLUP to create Canonical::empty(dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we end up redesigning extension arrays to contain their storage_dtype as part of the type definition, i'm not sure we'd need it
Per discussion on Slack yesterday, I updated the branch to have canonicalization behavior Chunked(Extension(storage)) -> Extension(Chunked(storage)). As part of that, I added call to .into_canonical() for the DateTimePartsCompressor. |
Should change the commit message since the behaviour is different than what we started with |
Since #346, we've been canonicalizing chunked ExtensionArray's incorrectly. Rather than unwrapping each chunk to its backing storage array we've just been building a new chunkedarray of the ExtensionArray chunks.
Before #480 , this was actually causing DateTimePartsCompressor to fail in can_compress, so we weren't going down the compression codepath.
The Fix
Fix
into_canonical
so that when we encounter aChunkedArray(ExtensionArray(storage))
to yield anExtensionArray(ChunkedArray(storage))
where each chunk is canonicalized first (e.g. if you have chunks of DateTimePartsArray you will end up with chunks of ExtensionArray(i64)).Fix the DateTimePartsCompressor to canonicalize its input, to handle that case where it may be a ChunkedArray.