-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read nested Parquet 2-level lists correctly #6757
base: master
Are you sure you want to change the base?
Conversation
parquet/src/schema/types.rs
Outdated
|
||
/// Returns `true` if this type is annotated as a list. | ||
pub fn is_list(&self) -> bool { | ||
match self.is_group() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since .is_group()
returns a boolean wouldn't it be better semantics to put it inside a branch conditional just using if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Rust has warped me...I feel like I'm doing it wrong if I use if
😅. But you're right that it would be more succinct.
use arrow::datatypes::ToByteSlice; | ||
|
||
let testdata = arrow::util::test_util::parquet_test_data(); | ||
// message my_record { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thank you for putting a comment with the test data. I like this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix CI 👍
I wish I could 😅. It appears that test is borked for everyone due to some upstream shenanigans. And now fixed by #6745 |
Which issue does this PR close?
Closes #6756.
Rationale for this change
See issue.
What changes are included in this PR?
Modifies both the arrow and record readers to check for nested lists before triggering the rule that repeated groups named "array" are treated as
list<OneTuple>
.Are there any user-facing changes?
Changes the interpretation of some legacy schemas.