-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Make it easier to call window functions via expression API (and add example) #6746
Conversation
/// let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?; | ||
/// | ||
/// // The following is the equivalent of "SELECT FIRST_VALUE(b) OVER(PARTITION BY a)" | ||
/// let first_value = first_value(col("b")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the PR is motivated by trying to write this example. It turned out to be quite a pain to create the right Expr
.
) -> Self { | ||
/// Create a new Window expression with the specified argument an | ||
/// empty `OVER` clause | ||
pub fn new(fun: impl Into<window_function::WindowFunction>, args: Vec<Expr>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the major change -- so that WindowFunction acts like a Builder
which can be used to set partition by
, order by
and window_frame
with a more self documenting style
As a new user, I like this change very much. I found creating lead and lag functions to be difficult to use. Suppose I wanted to call lead but provide the other two optional arguments - shift offset and default value. Would the best way to do that with this proposal to create the
|
I think it would look something like this let single_lead = lead(col("mycol"))
.
let two_lead_with_default = lead.call(col("mycol"))
.offset(lit(2))
.partition_by(lit(mydefault)]))
.build() |
I'm looking at the arguments we end up passing. I think
and for nth value it's easier
|
That makes sense to me @timsaucer -- thank you for the feedback. I don't think I will have time to work on this PR / issue for a while, but I tried to give it some visibility in #10395 and hopefully we can find someone else who is interested in helping make it work |
(or of course if you have the time it would be an amazing contribution 🙏 ) |
I'm willing to work on this, but I'd like to wrap up the examples I have going into datafusion-python first. I think those will have more impact. I only have an hour or so a day to work on this, so I can't promise how fast I'll get to it. |
Thank you very much 🙏 I totally understand re time constraints. Maybe we'll find some other people to help too. One can hope |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
We can revive this PR / its API when someone has time to work on it In case anyone is following along, @jayzhan211 added a really nice trait for working with aggregate functions. Maybe we can do something similar for window functions eventually datafusion/datafusion/expr/src/udaf.rs Lines 614 to 653 in e693ed7
|
I should have time to work on this near the end of the week or early next week. I'm in nearing the end of a large PR for |
Thanks @timsaucer -- looking forward to it. I think the most useful part of this PR are the doc examples |
before polishing this PR up I wanted to get some feedback on the approach
Which issue does this PR close?
Related to #5781
Closes: #6747
Rationale for this change
@mustafasrepo suggested adding an example for
DataFrame::window
here: #6703 (comment)When I tried to do so it turns out that creating an
Expr::Window
is quite challenging, so I took a step back and tried to clean up the API a littleWhat changes are included in this PR?
WindowExpr
to be a builder style interface and thus making it easier to constructExpr::WindowFunction
variants. This is modeled after theCaseBuilder
lead
,lag
, etc. inexpr_fns
to follow the model of the other types of functionsAre these changes tested?
Yes
TOOD: I need to write tests for all the code in
expr_fns
which I will add to https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/dataframe/dataframe_functions.rs if people like this basic approachAre there any user-facing changes?
yes, the APIs to create window functions are improved