-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parsing SQL strings to Expr
s
#8736
Comments
Also related #7165 I think this also came up a few days ago but I can't find the reference (maybe @devinjdangelo remembers 🤔 ) |
Also relates to #4890 I believe |
#8661 perhaps? Though that is talking about going in the opposite direction. |
If maintainers thinks this is a reasonable request, and no one is working on this, I will draft a PR in next few days. |
Making exprs from strings sounds like a good idea to me. BTW you can also use the more fluent style API to make exprs, which while still tedious is still a bit better: Rather than Expr::BinaryExpr(datafusion::logical_expr::BinaryExpr::new(
Box::new(col("a")),
Operator::Gt,
Box::new(lit(1)),
)) You can write col("a").gt(lit(1)) The benfit of this approach over strings is that the compiler can check / ensure the exprs are constructed correctly. However, I think the usecase of usign strings to, for example, serialize Exprs and let users provide arbitrary filters makes a |
I think you can do this today via Step 1: Add an example to show how to use
|
Expr
s
This is what I have been using in my POC (still a WIP, ignore the panic :) ) |
Thanks @Omega359 -- that looks nice. I think the next step is to turn it into a function with some doc strings and an example |
I took a quick look at the impl for sql_to_expr and on down the code paths from there. Removing the requirement for a schema may be problematic since so much of the api requires a schema for handling identifiers, aliases and type information. |
Maybe we could use an empty schema (e.g. https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html#method.empty) if there were no column references I agree that if there are column references in the expression, the user will need to provide a schema for parsing |
This new testcase has some similar code: Maybe it can be cleaned up when the new API is added. |
This came up again on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1252616019357470731 Here is some code that @Omega359 shared pub(crate) fn build_expression(norm: &FieldNormalizer, df: &DataFrame, expression: &str) -> Result<Expr> {
let parser = Parser::new(&AnsiDialect {});
let tokenized = parser.try_with_sql(expression);
if tokenized.is_err() {
bail!("Unable to tokenize expression '{}': {:?}", &expression, tokenized.err())
}
let parsed = tokenized?.parse_expr();
if parsed.is_err() {
bail!("Unable to parse expression '{}': {:?}", &expression, parsed.err())
}
let expr = parsed?;
debug!("Parsed expression was '{}'", expr);
let parser_options = ParserOptions {
enable_ident_normalization: false,
parse_float_as_decimal: false,
};
let table_schema_provider = TableSchemaProvider::new(norm.table_name.clone(), norm.session_context.clone());
let sql_to_rel = SqlToRel::new_with_options(&table_schema_provider, parser_options);
let sql_to_expr = sql_to_rel.sql_to_expr(expr.clone(), df.schema(), &mut PlannerContext::new());
if sql_to_expr.is_err() {
bail!("Unable to transform sql expression '{}' to datafusion Expr: {:?}", &expression, sql_to_expr.err())
}
Ok(sql_to_expr?)
} I think it would be great to add an API to impl SessionContext {
/// parse the provided expression as a string
fn parse_sql(&self, sql: &str) -> Result<Expr> { ... }
} impl DataFrame {
/// parse the provided expression as a string
fn parse_sql(&self, sql: &str) -> Result<Expr> { ... }
} Perhaps @xinlifoobar has time to help here 🤔 |
I am trying to look into this. |
I remade this comment as my previous example is not proper here. I am thinking of the usages of let ctx = SessionContext::new();
// Read the data from a csv file
let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
let expr1 = df.parse_sql("a > 10");
let expr2 = df.parse_sql("a < 10"); This is reasonable where df_schema is store local inside the dataframe. |
@xinlifoobar has a very nice PR to add this API: #10995 |
Part of #9494
Is your feature request related to a problem or challenge?
Currently when we need to construct an
Expr
, we need to build it by hand, takea > 1
for example:Although this works, it becomes tedious when we need to construct complex expr like
a > 1 and b in (1,10) ....
Describe the solution you'd like
Impl
FromStr
for Expr, so we can get anExpr
from String directly.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: