Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow writing null valued keys in JSON #5065

Merged
merged 7 commits into from
Nov 20, 2023

Conversation

Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Nov 12, 2023

Which issue does this PR close?

Closes #3774

Rationale for this change

Allow users to write JSON with null values for keys, where previous (and default) is to skip the keys.

What changes are included in this PR?

In arrow-json writer mod, new builder for Writer which allows configuring how null valued keys (and null arrays) should be handled. Will default to skipping keys with null values for backward compatibility.

Are there any user-facing changes?

Yes

@github-actions github-actions bot added arrow Changes to the arrow crate object-store Object Store Interface labels Nov 12, 2023
arrow-json/src/writer.rs Outdated Show resolved Hide resolved
@tustvold
Copy link
Contributor

As commented on the ticket, the current behaviour is simply incorrect as it will not survive a round-trip. I therefore don't think there is a need to make the prior logic configurable, which I think should simplify this PR

@Jefffrey
Copy link
Contributor Author

As commented on the ticket, the current behaviour is simply incorrect as it will not survive a round-trip. I therefore don't think there is a need to make the prior logic configurable, which I think should simplify this PR

If you're referring to #5066 then that is a separate issue which this PR doesn't cover.

@tustvold
Copy link
Contributor

Aah I see, in that case is there a way we could make this an option instead of a generic. As it stands this is a breaking change, which I'm not sure about

@Jefffrey
Copy link
Contributor Author

Can you give an example where this might break existing uses? I thought including a default for the new generic parameter for Writer would make this backwards compatible, but would be happy to be proven wrong.

@tustvold
Copy link
Contributor

I could be mistaken but i thought adding new parameters was always a breaking change. Regardless using generics here will result in additional codegen and I'm doubtful will improve performance. Adding a WriterOptions or similar should be more future proof and more consistent with other implementations

@Jefffrey
Copy link
Contributor Author

Regardless using generics here will result in additional codegen and I'm doubtful will improve performance. Adding a WriterOptions or similar should be more future proof and more consistent with other implementations

Yeah that makes 👍

I'll work on switching to such

@Jefffrey Jefffrey marked this pull request as draft November 13, 2023 11:07
@Jefffrey Jefffrey marked this pull request as ready for review November 20, 2023 11:30
@Jefffrey
Copy link
Contributor Author

Removed configuring this null behaviour as a generic trait, and changed to be a config that can be set via a new WriterBuilder (akin to arrow-csv writing behaviour)

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, my only comment is I think it should be keep_null_values or possibly preserve_nulls, as null keys are actually illegal in JSON 😅

@Jefffrey
Copy link
Contributor Author

This looks good to me, my only comment is I think it should be keep_null_values or possibly preserve_nulls, as null keys are actually illegal in JSON 😅

Yeah I wasn't sure on the proper name. I was gonna leave it as keep_nulls but I felt that could be confusing since it might imply affecting null values in arrays which should always be kept regardless.

Maybe something like preserve_object_nulls?

@tustvold
Copy link
Contributor

Perhaps explicit_nulls ?

@Jefffrey
Copy link
Contributor Author

Perhaps explicit_nulls ?

Sounds better, renamed to that 👍

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you

@tustvold tustvold merged commit fbbb61d into apache:master Nov 20, 2023
30 checks passed
@Jefffrey Jefffrey deleted the json_write_nulls branch November 20, 2023 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arrow_json::LineDelimitedWriter: Write nulls
2 participants