-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement more expr_to_sql functionality #9578
Conversation
datafusion/sql/Cargo.toml
Outdated
@@ -39,6 +39,7 @@ unicode_expressions = [] | |||
[dependencies] | |||
arrow = { workspace = true } | |||
arrow-schema = { workspace = true } | |||
chrono = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we are sensitive to new dependencies so wanted to flag this proposed addition to datafusion-sql @alamb ...
It is used in this PR to facilitate converting arrow Date types to a formatted string which can be placed in a SQL expression like CAST('2024-01-01' to DATE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a good solution to put the unparser behind a feature flag and then any unparser specific dependencies can only be enabled if one needs unparser support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think arrow already depends on chrono so this change doesn't add any new actual dependency (just an explicit one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and put unparser behind a (default) feature flag anyway. Could come in handy if we need a more invasive dependency later or someone just does not want this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @devinjdangelo
I actually took a stab at writing tests for this functionality (the roundtrip tests) and I got hung up on the lack of #8736
I feel like with just a little plumbing that could work and then writing tests for this feature would be easier.
I just need to find about 10 more hours in each day 😆
datafusion/sql/src/unparser/expr.rs
Outdated
ScalarValue::Date32(None) => Ok(ast::Value::Null), | ||
ScalarValue::Date64(Some(_d)) => not_impl_err!("Unsupported scalar: {v:?}"), | ||
ScalarValue::Date64(None) => Ok(ast::Value::Null), | ||
ScalarValue::Date32(Some(d)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is some function we can use in arrow-rs do to this conversion
Perhaps https://docs.rs/arrow/latest/arrow/array/types/struct.Date32Type.html#method.to_naive_date ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I was able to remove the chrono explicit dependency and downcast to arrow arrays and use those methods instead.
Same here. I can take a more dedicated crack at round-trip tests in a follow on PR. I wrote tests following the existing pattern for now, which was not too difficult either.
☝️ this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really nice to me -- thank you @devinjdangelo
@@ -33,11 +33,13 @@ name = "datafusion_sql" | |||
path = "src/lib.rs" | |||
|
|||
[features] | |||
default = ["unicode_expressions"] | |||
default = ["unicode_expressions", "unparser"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -315,14 +489,78 @@ mod tests { | |||
#[test] | |||
fn expr_to_sql_ok() -> Result<()> { | |||
let tests: Vec<(Expr, &str)> = vec![ | |||
(col("a").gt(lit(4)), r#"a > 4"#), | |||
((col("a") + col("b")).gt(lit(4)), r#"((a + b) > 4)"#), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -347,7 +585,7 @@ mod tests { | |||
|
|||
let actual = format!("{}", ast); | |||
|
|||
let expected = r#"'a' > 4"#; | |||
let expected = r#"('a' > 4)"#; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these parens are due to using Expr::Nested
right? I think it is a good design to add explicit parens to ensure the evaluation order is correctly reflected
Perhaps we can also add a note in https://github.com/apache/arrow-datafusion?tab=readme-ov-file#crate-features about the |
r#"CAST(a AS INTEGER UNSIGNED)"#, | ||
), | ||
( | ||
Expr::Literal(ScalarValue::Date64(Some(0))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is not possible to directly represent a Date scalar in a SQL string, these non round trip tests are still needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @devinjdangelo
The windows CI failure seem unrelated |
Which issue does this PR close?
related to #9495
Rationale for this change
Implements additional Expr -> ast::Expr conversions
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
More supported conversions