Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement more expr_to_sql functionality #9578

Merged
merged 14 commits into from
Mar 14, 2024

Conversation

devinjdangelo
Copy link
Contributor

@devinjdangelo devinjdangelo commented Mar 12, 2024

Which issue does this PR close?

related to #9495

Rationale for this change

Implements additional Expr -> ast::Expr conversions

What changes are included in this PR?

  • Implement cast expressions
  • Implement arrow -> ast::DataType conversion
  • Implement Aggregate expression conversions (for built in aggregate functions)
  • Implement Date32 and Date64 scalar conversions
  • In support of Date scalar conversions, datafusion scalars now map to SQL expressions
  • Binary Exprs now always wrapped in ast::Expr::Nested to preserve order of operations encoded in LogicalPlan

Are these changes tested?

Yes

Are there any user-facing changes?

More supported conversions

@github-actions github-actions bot added the sql SQL Planner label Mar 12, 2024
@@ -39,6 +39,7 @@ unicode_expressions = []
[dependencies]
arrow = { workspace = true }
arrow-schema = { workspace = true }
chrono = { workspace = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we are sensitive to new dependencies so wanted to flag this proposed addition to datafusion-sql @alamb ...

It is used in this PR to facilitate converting arrow Date types to a formatted string which can be placed in a SQL expression like CAST('2024-01-01' to DATE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good solution to put the unparser behind a feature flag and then any unparser specific dependencies can only be enabled if one needs unparser support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think arrow already depends on chrono so this change doesn't add any new actual dependency (just an explicit one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and put unparser behind a (default) feature flag anyway. Could come in handy if we need a more invasive dependency later or someone just does not want this functionality.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @devinjdangelo

I actually took a stab at writing tests for this functionality (the roundtrip tests) and I got hung up on the lack of #8736

I feel like with just a little plumbing that could work and then writing tests for this feature would be easier.

I just need to find about 10 more hours in each day 😆

ScalarValue::Date32(None) => Ok(ast::Value::Null),
ScalarValue::Date64(Some(_d)) => not_impl_err!("Unsupported scalar: {v:?}"),
ScalarValue::Date64(None) => Ok(ast::Value::Null),
ScalarValue::Date32(Some(d)) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is some function we can use in arrow-rs do to this conversion

Perhaps https://docs.rs/arrow/latest/arrow/array/types/struct.Date32Type.html#method.to_naive_date ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I was able to remove the chrono explicit dependency and downcast to arrow arrays and use those methods instead.

@devinjdangelo
Copy link
Contributor Author

I actually took a stab at writing tests for this functionality (the roundtrip tests) and I got hung up on the lack of #8736

Same here. I can take a more dedicated crack at round-trip tests in a follow on PR. I wrote tests following the existing pattern for now, which was not too difficult either.

I just need to find about 10 more hours in each day 😆

☝️ this

@devinjdangelo devinjdangelo changed the title wip feat: implement more expr_to_sql functionality feat: implement more expr_to_sql functionality Mar 13, 2024
@devinjdangelo devinjdangelo marked this pull request as ready for review March 13, 2024 01:14
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really nice to me -- thank you @devinjdangelo

@@ -33,11 +33,13 @@ name = "datafusion_sql"
path = "src/lib.rs"

[features]
default = ["unicode_expressions"]
default = ["unicode_expressions", "unparser"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -315,14 +489,78 @@ mod tests {
#[test]
fn expr_to_sql_ok() -> Result<()> {
let tests: Vec<(Expr, &str)> = vec![
(col("a").gt(lit(4)), r#"a > 4"#),
((col("a") + col("b")).gt(lit(4)), r#"((a + b) > 4)"#),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@@ -347,7 +585,7 @@ mod tests {

let actual = format!("{}", ast);

let expected = r#"'a' > 4"#;
let expected = r#"('a' > 4)"#;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these parens are due to using Expr::Nested right? I think it is a good design to add explicit parens to ensure the evaluation order is correctly reflected

@alamb
Copy link
Contributor

alamb commented Mar 13, 2024

Perhaps we can also add a note in https://github.com/apache/arrow-datafusion?tab=readme-ov-file#crate-features about the unparser feature

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 13, 2024
r#"CAST(a AS INTEGER UNSIGNED)"#,
),
(
Expr::Literal(ScalarValue::Date64(Some(0))),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is not possible to directly represent a Date scalar in a SQL string, these non round trip tests are still needed.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @devinjdangelo

@alamb
Copy link
Contributor

alamb commented Mar 14, 2024

The windows CI failure seem unrelated

@alamb alamb merged commit 5911d18 into apache:main Mar 14, 2024
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants