Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read nested Parquet 2-level lists correctly #6757

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

etseidl
Copy link
Contributor

@etseidl etseidl commented Nov 19, 2024

Which issue does this PR close?

Closes #6756.

Rationale for this change

See issue.

What changes are included in this PR?

Modifies both the arrow and record readers to check for nested lists before triggering the rule that repeated groups named "array" are treated as list<OneTuple>.

Are there any user-facing changes?

Changes the interpretation of some legacy schemas.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Nov 19, 2024

/// Returns `true` if this type is annotated as a list.
pub fn is_list(&self) -> bool {
match self.is_group() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since .is_group() returns a boolean wouldn't it be better semantics to put it inside a branch conditional just using if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Rust has warped me...I feel like I'm doing it wrong if I use if 😅. But you're right that it would be more succinct.

use arrow::datatypes::ToByteSlice;

let testdata = arrow::util::test_util::parquet_test_data();
// message my_record {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you for putting a comment with the test data. I like this :)

Copy link

@devanbenz devanbenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix CI 👍

@etseidl
Copy link
Contributor Author

etseidl commented Nov 21, 2024

Please fix CI 👍

I wish I could 😅. It appears that test is borked for everyone due to some upstream shenanigans.

And now fixed by #6745

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet readers incorrectly interpret legacy nested lists
2 participants