Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50377][SQL] Allow to evaluate foldable RuntimeReplaceable #48912

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -440,8 +440,12 @@ trait RuntimeReplaceable extends Expression {
// are semantically equal.
override lazy val canonicalized: Expression = replacement.canonicalized

final override def eval(input: InternalRow = null): Any =
throw QueryExecutionErrors.cannotEvaluateExpressionError(this)
final override def eval(input: InternalRow = null): Any = {
// For convenience, we allow to evaluate `RuntimeReplaceable` expressions, in case we need to
// get a constant from foldable expression before the query execution starts.
assert(input == null)
Copy link
Member

@MaxGekk MaxGekk Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you clarify why do you require null instead of just bypass input to replacement.eval()

Copy link
Contributor

@LuciferYang LuciferYang Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the premise here must be that input == null, then would it be better to throw a meaningful QueryExecutionError instead of an AssertionError when input is not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally people should never eagerly evaluate a RuntimeReplaceable, the only case I see is evaluating foldable expressions. I think it's better to only allow it, to make sure that there is no RuntimeReplaceable leaked to the runtime execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use assert as this means bug, not a user error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use assert as this means bug, not a user error.

Got it,thanks

replacement.eval()
}
final override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode =
throw QueryExecutionErrors.cannotGenerateCodeForExpressionError(this)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ case class UnaryMinus(
since = "1.5.0",
group = "math_funcs")
case class UnaryPositive(child: Expression)
extends RuntimeReplaceable with ImplicitCastInputTypes {
extends UnaryExpression with RuntimeReplaceable with ImplicitCastInputTypes {
override def nullIntolerant: Boolean = true
cloud-fan marked this conversation as resolved.
Show resolved Hide resolved

override def prettyName: String = "positive"
Expand All @@ -128,11 +128,8 @@ case class UnaryPositive(child: Expression)

override lazy val replacement: Expression = child

override protected def withNewChildrenInternal(
newChildren: IndexedSeq[Expression]): UnaryPositive =
copy(newChildren.head)

override def children: Seq[Expression] = child :: Nil
override protected def withNewChildInternal(newChild: Expression): Expression =
copy(child = newChild)
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -995,6 +995,7 @@ SELECT
lag(v, 1) IGNORE NULLS OVER w lag_1,
lag(v, 2) IGNORE NULLS OVER w lag_2,
lag(v, 3) IGNORE NULLS OVER w lag_3,
lag(v, +3) IGNORE NULLS OVER w lag_plus_3,
nth_value(v, 1) IGNORE NULLS OVER w nth_value_1,
nth_value(v, 2) IGNORE NULLS OVER w nth_value_2,
nth_value(v, 3) IGNORE NULLS OVER w nth_value_3,
Expand All @@ -1007,9 +1008,9 @@ WINDOW w AS (ORDER BY id)
ORDER BY id
-- !query analysis
Sort [id#x ASC NULLS FIRST], true
+- Project [content#x, id#x, v#x, lead_0#x, lead_1#x, lead_2#x, lead_3#x, lag_0#x, lag_1#x, lag_2#x, lag_3#x, nth_value_1#x, nth_value_2#x, nth_value_3#x, first_value#x, any_value#x, last_value#x]
+- Project [content#x, id#x, v#x, lead_0#x, lead_1#x, lead_2#x, lead_3#x, lag_0#x, lag_1#x, lag_2#x, lag_3#x, nth_value_1#x, nth_value_2#x, nth_value_3#x, first_value#x, any_value#x, last_value#x, lead_0#x, lead_1#x, lead_2#x, lead_3#x, lag_0#x, lag_1#x, lag_2#x, ... 7 more fields]
+- Window [lead(v#x, 0, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 0, 0)) AS lead_0#x, lead(v#x, 1, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead_1#x, lead(v#x, 2, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 2, 2)) AS lead_2#x, lead(v#x, 3, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 3, 3)) AS lead_3#x, lag(v#x, 0, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 0, 0)) AS lag_0#x, lag(v#x, -1, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS lag_1#x, lag(v#x, -2, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, -2)) AS lag_2#x, lag(v#x, -3, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -3, -3)) AS lag_3#x, nth_value(v#x, 1, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS nth_value_1#x, nth_value(v#x, 2, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS nth_value_2#x, nth_value(v#x, 3, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS nth_value_3#x, first(v#x, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS first_value#x, any_value(v#x, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS any_value#x, last(v#x, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS last_value#x], [id#x ASC NULLS FIRST]
+- Project [content#x, id#x, v#x, lead_0#x, lead_1#x, lead_2#x, lead_3#x, lag_0#x, lag_1#x, lag_2#x, lag_3#x, lag_plus_3#x, nth_value_1#x, nth_value_2#x, nth_value_3#x, first_value#x, any_value#x, last_value#x]
+- Project [content#x, id#x, v#x, lead_0#x, lead_1#x, lead_2#x, lead_3#x, lag_0#x, lag_1#x, lag_2#x, lag_3#x, lag_plus_3#x, nth_value_1#x, nth_value_2#x, nth_value_3#x, first_value#x, any_value#x, last_value#x, lead_0#x, lead_1#x, lead_2#x, lead_3#x, lag_0#x, lag_1#x, ... 9 more fields]
+- Window [lead(v#x, 0, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 0, 0)) AS lead_0#x, lead(v#x, 1, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead_1#x, lead(v#x, 2, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 2, 2)) AS lead_2#x, lead(v#x, 3, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 3, 3)) AS lead_3#x, lag(v#x, 0, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, 0, 0)) AS lag_0#x, lag(v#x, -1, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS lag_1#x, lag(v#x, -2, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -2, -2)) AS lag_2#x, lag(v#x, -3, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -3, -3)) AS lag_3#x, lag(v#x, -3, null) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RowFrame, -3, -3)) AS lag_plus_3#x, nth_value(v#x, 1, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS nth_value_1#x, nth_value(v#x, 2, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS nth_value_2#x, nth_value(v#x, 3, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS nth_value_3#x, first(v#x, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS first_value#x, any_value(v#x, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS any_value#x, last(v#x, true) windowspecdefinition(id#x ASC NULLS FIRST, specifiedwindowframe(RangeFrame, unboundedpreceding$(), currentrow$())) AS last_value#x], [id#x ASC NULLS FIRST]
+- Project [content#x, id#x, v#x]
+- SubqueryAlias test_ignore_null
+- View (`test_ignore_null`, [content#x, id#x, v#x])
Expand Down
1 change: 1 addition & 0 deletions sql/core/src/test/resources/sql-tests/inputs/window.sql
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ SELECT
lag(v, 1) IGNORE NULLS OVER w lag_1,
lag(v, 2) IGNORE NULLS OVER w lag_2,
lag(v, 3) IGNORE NULLS OVER w lag_3,
lag(v, +3) IGNORE NULLS OVER w lag_plus_3,
nth_value(v, 1) IGNORE NULLS OVER w nth_value_1,
nth_value(v, 2) IGNORE NULLS OVER w nth_value_2,
nth_value(v, 3) IGNORE NULLS OVER w nth_value_3,
Expand Down
21 changes: 11 additions & 10 deletions sql/core/src/test/resources/sql-tests/results/window.sql.out
Original file line number Diff line number Diff line change
Expand Up @@ -1060,6 +1060,7 @@ SELECT
lag(v, 1) IGNORE NULLS OVER w lag_1,
lag(v, 2) IGNORE NULLS OVER w lag_2,
lag(v, 3) IGNORE NULLS OVER w lag_3,
lag(v, +3) IGNORE NULLS OVER w lag_plus_3,
nth_value(v, 1) IGNORE NULLS OVER w nth_value_1,
nth_value(v, 2) IGNORE NULLS OVER w nth_value_2,
nth_value(v, 3) IGNORE NULLS OVER w nth_value_3,
Expand All @@ -1071,17 +1072,17 @@ FROM
WINDOW w AS (ORDER BY id)
ORDER BY id
-- !query schema
struct<content:string,id:int,v:string,lead_0:string,lead_1:string,lead_2:string,lead_3:string,lag_0:string,lag_1:string,lag_2:string,lag_3:string,nth_value_1:string,nth_value_2:string,nth_value_3:string,first_value:string,any_value:string,last_value:string>
struct<content:string,id:int,v:string,lead_0:string,lead_1:string,lead_2:string,lead_3:string,lag_0:string,lag_1:string,lag_2:string,lag_3:string,lag_plus_3:string,nth_value_1:string,nth_value_2:string,nth_value_3:string,first_value:string,any_value:string,last_value:string>
-- !query output
a 0 NULL NULL x y z NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
a 1 x x y z v x NULL NULL NULL x NULL NULL x x x
b 2 NULL NULL y z v NULL x NULL NULL x NULL NULL x x x
c 3 NULL NULL y z v NULL x NULL NULL x NULL NULL x x x
a 4 y y z v NULL y x NULL NULL x y NULL x x y
b 5 NULL NULL z v NULL NULL y x NULL x y NULL x x y
a 6 z z v NULL NULL z y x NULL x y z x x z
a 7 v v NULL NULL NULL v z y x x y z x x v
a 8 NULL NULL NULL NULL NULL NULL v z y x y z x x v
a 0 NULL NULL x y z NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
a 1 x x y z v x NULL NULL NULL NULL x NULL NULL x x x
b 2 NULL NULL y z v NULL x NULL NULL NULL x NULL NULL x x x
c 3 NULL NULL y z v NULL x NULL NULL NULL x NULL NULL x x x
a 4 y y z v NULL y x NULL NULL NULL x y NULL x x y
b 5 NULL NULL z v NULL NULL y x NULL NULL x y NULL x x y
a 6 z z v NULL NULL z y x NULL NULL x y z x x z
a 7 v v NULL NULL NULL v z y x x x y z x x v
a 8 NULL NULL NULL NULL NULL NULL v z y y x y z x x v


-- !query
Expand Down