Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove single_file_output option from FileSinkConfig and Copy statement #9041

Merged
merged 9 commits into from
Jan 30, 2024

Conversation

yyy1000
Copy link
Contributor

@yyy1000 yyy1000 commented Jan 29, 2024

Which issue does this PR close?

Closes #8621

Rationale for this change

See #8621 for details

What changes are included in this PR?

  1. Remove single_file_output in FileSinkConfig and LogicalPlan::Copy, but using / in path to see whether it's a single file or not.
  2. Remove the field in proto for serde
  3. Change test cases, this includes two aspects, (1). Test cases will not need single_file_option as a param. (2). Test cases which used explicit single_file_output = false and didn't specify the path ended in / , (in modified code, the path will be treated as a single file which caused error)
  4. Document update, though I think it needs to be polished.

Are these changes tested?

Yes, cargo test in local env

Are there any user-facing changes?

I think yes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 29, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @yyy1000 -- I think this looks great. Thank you for the help and contribution 🙏

FYI @devinjdangelo in case you have some time to review or have additional comments or changes to suggest

@yyy1000
Copy link
Contributor Author

yyy1000 commented Jan 29, 2024

Haha, I'm happy that I could help it. :)

Copy link
Contributor

@devinjdangelo devinjdangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Thanks for the effort 🙏. Only one minor comment to consider.

@@ -718,19 +718,13 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {

let mut statement_options = StatementOptions::new(options);
let file_format = statement_options.try_infer_file_type(&statement.target)?;
let single_file_output =
statement_options.take_bool_option("single_file_output")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth keeping the call to take_bool_option and discarding the result. That will allow queries that specify single file output to continue working as expected in most cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a good idea.
But I worry that some users didn't specify the output folder path using /, but using a
single_output_option = false. Keeping this option may let the query running but has different result. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it seems cleaner to just remove the option rather than ignore it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove SINGLE_FILE_OUTPUT option from COPY statement
3 participants