Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate sql examples #11173

Closed
wants to merge 10 commits into from
Closed

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 28, 2024

This looks like a massive change, but it is really just moving code around

Which issue does this PR close?

Part of #11172

Rationale for this change

Fewer example files will make it easier to find what you are looking for

What changes are included in this PR?

Consolidate sql examples into datafusion-examples/examples/sql.rs instead of

  • parqet_sql
  • parqet_sql_multiple_files
  • csv_sql
  • avro_sql
  • regexp
  • to_char
  • to_date
  • to_timestamp
  • make_date

Are these changes tested?

By existing CI job

Are there any user-facing changes?

Hopefully easier to navigate / find examples

@github-actions github-actions bot added the sql SQL Planner label Jun 28, 2024
@alamb alamb changed the title Alamb/consolidate examples Consolidate sql examples Jun 28, 2024
@@ -1,51 +0,0 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples were inlined

@@ -109,3 +129,135 @@ async fn example_read_csv_file_with_schema(file_path: &str) -> DataFrame {
// Register a lazy DataFrame by using the context and option provider
ctx.read_csv(file_path, csv_read_option).await.unwrap()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting the related functionality together in one example will make it easier for people to find what they are looking for (especially as the number of examples in DataFusion continues to grow)

@@ -15,6 +15,10 @@
// specific language governing permissions and limitations
// under the License.

//! This example shows how to use (only) the DataFusion SQL parser
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this example (not it is in datafusion-sql not datafusion core) and added some comments about what it does

// specific language governing permissions and limitations
// under the License.

//! This file contains several examples of how to run SQL queries using DataFusion
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved all the sql examples here and added some introductory material

@@ -109,3 +129,135 @@ async fn example_read_csv_file_with_schema(file_path: &str) -> DataFrame {
// Register a lazy DataFrame by using the context and option provider
ctx.read_csv(file_path, csv_read_option).await.unwrap()
}

/// This example demonstrates how to use the to_date series
/// of functions in the DataFrame API
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the consolidated examples had both SQL and dataframe examples, so I split such examples into dataframe.rs and sql.rs

@alamb alamb marked this pull request as ready for review June 28, 2024 21:19
@alamb
Copy link
Contributor Author

alamb commented Jun 28, 2024

@lewiszlw do you have time to review this large (but mechanical) PR?

@lewiszlw
Copy link
Member

I agree current examples are hard to navigate. What about moving these examples under sql directory? I am concerned that the sql.rs file will become very large.

@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

I agree current examples are hard to navigate. What about moving these examples under sql directory? I am concerned that the sql.rs file will become very large.

That is a good point 🤔 And I suppose a single large .rs file is unlikely to be very easy to use / find either 🤔

Perhaps what would be ideal would be to move the examples into the library guide so they can live alongside the actual docs.

https://github.com/apache/datafusion/tree/main/docs/source/library-user-guide

That way it would be easier to find them.

However, I think we would have to make sure they are automatically tested: #1813

I did do some digging through old PRs, here is an example that seems to do such a test #2018

I'll see if I can use that same approach...

@alamb alamb marked this pull request as draft June 29, 2024 18:06
@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

Converting to a draft while working on this one

@alamb
Copy link
Contributor Author

alamb commented Jun 29, 2024

I played around with what we have and it turns out I think we can test examples in the user guide. I made a PR with docs of how to do it: #11178

Perhaps with that as a framework, we can actually move many of the rust examples inline into the user guide -- I think that would be a better docs experience

@alamb alamb closed this Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants