-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support binary temporal arithmetic with integers #13741
base: main
Are you sure you want to change the base?
Conversation
let (lhs, rhs) = get_input_types(&$LHS_TYPE, &$OP, &$RHS_TYPE)?; | ||
assert_eq!(lhs, $RESULT_TYPE1); | ||
assert_eq!(rhs, $RESULT_TYPE2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for +
operator symmetry is expected, so the assert function could do this for us
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The operator is symmetric, but how we cast the left and right types is not symmetric. Hence I needed an assymetric test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have this
let (a, b) = get_input_types(&$LHS_TYPE, &$OP, &$RHS_TYPE)?;
assert_eq!(a, $RESULT_TYPE1);
assert_eq!(b, $RESULT_TYPE2);
// test symmetry
let (b, a) = get_input_types(&$RHS_TYPE, &$OP, &$LHS_TYPE)?;
assert_eq!(a, $RESULT_TYPE1);
assert_eq!(b, $RESULT_TYPE2);
/// | ||
/// Applies coercion rules for `$LHS_TYPE $OP $RHS_TYPE` and asserts that the | ||
/// the result types are `$RESULT_TYPE1` and `$RESULT_TYPE2` respectively | ||
macro_rules! test_coercion_assymetric_binary_rule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of this being a macro? Could this be ordinary function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know -- I just copied how the rest is done here, via macros too. This macro is a clone of test_coercion_binary_rule macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assumed so. But also -- macros aren't "better functions", they come with downsides, so if something is expressible as a function, it should.
} else if matches!( | ||
(&this_type, cast_to_type), | ||
(DataType::Int32 | DataType::Int64, DataType::Interval(_)) | ||
) { | ||
// Convert integer (days) to the corresponding DayTime interval | ||
match self { | ||
Expr::Literal(ScalarValue::Int32(Some(days))) => { | ||
Ok(Expr::Literal(ScalarValue::IntervalDayTime(Some( | ||
arrow_buffer::IntervalDayTime::new(days, 0), | ||
)))) | ||
} | ||
Expr::Literal(ScalarValue::Int64(Some(days))) => { | ||
Ok(Expr::Literal(ScalarValue::IntervalDayTime(Some( | ||
arrow_buffer::IntervalDayTime::new(days as i32, 0), | ||
)))) | ||
} | ||
_ => plan_err!( | ||
"Cannot automatically convert {this_type:?} to {cast_to_type:?}" | ||
), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works for literals only and is immediate.
For other types all the function does is create cast expression. Why would we want to special-case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the alternative? You can't cast an integer to an interval in arrow.
You are right; it only works for literals. Do you have thoughts on how to make it work for non also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a Cast is supposed to work, it should work regardless how it's weaved into the plan.
Expr.cast_to
should not have any additional logic (other than checks) vs creating Expr::Cast
directly.
However, we should first decide whether integer should be castable to interval at all.
It's not castable in PostgreSQL
postgres=# SELECT 10::interval;
ERROR: cannot cast type integer to interval
LINE 1: SELECT 10::interval;
^
so I am inclined it should not be castable in DF either.
Which means, the date + int
cannot be converted (desugared) to date + interval
using ordinary cast expression (as such shouldn't exist), but it can be converted (desugared) by some other means (eg with a custom conversion function or combination of expressions).
Alternatively, we defer date + int
until the execution, eg somewhere in
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> { |
but i'd prefer to desugar earlier and avoid separate code path in the
BinaryExpr
physical expr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we require a new scalar function, make_interval
, which can convert integers into intervals.
Then, similar to make_array
, perform transformations within the SQL planner.
Transform 4 + to_date('1970-01-03')
to make_interval(4) + to_date('1970-01-03')
inside plan_binary_op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -186,6 +186,12 @@ fn signature(lhs: &DataType, op: &Operator, rhs: &DataType) -> Result<Signature> | |||
} else if let Some(numeric) = mathematics_numerical_coercion(lhs, rhs) { | |||
// Numeric arithmetic, e.g. Int32 + Int32 | |||
Ok(Signature::uniform(numeric)) | |||
} else if let Some((new_lhs, new_rhs, ret)) = resolve_ints_to_intervals(lhs, rhs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're in Plus | Minus | Multiply | Divide | Modulo => {
branch
but the new logic should apply
- to
+
symmetrically - to
date - int
, but likely not toint - date
(at least per PostgreSQL) - not at all to other operators
so what about passing the operator to the new function and doing the work for +
and -
(so that code already identifies asymmetric treatment)
|
||
match (lhs, rhs) { | ||
// Handle integer + temporal types cases | ||
(Int32 | Int64, rhs) if rhs.is_temporal() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be date + int
, not any_temporal + int
:
postgres=# select now()::date + 1;
?column?
------------
2024-12-13
(1 row)
postgres=# select now()::time + 1;
ERROR: operator does not exist: time without time zone + integer
LINE 1: select now()::time + 1;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
postgres=# select now()::timestamp + 1;
ERROR: operator does not exist: timestamp without time zone + integer
LINE 1: select now()::timestamp + 1;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
postgres=# select now()::timestamptz + 1;
ERROR: operator does not exist: timestamp with time zone + integer
LINE 1: select now()::timestamptz + 1;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
(i didn't check other is_temporal
types: interval, duration)
/// | ||
/// Applies coercion rules for `$LHS_TYPE $OP $RHS_TYPE` and asserts that the | ||
/// the result types are `$RESULT_TYPE1` and `$RESULT_TYPE2` respectively | ||
macro_rules! test_coercion_assymetric_binary_rule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assumed so. But also -- macros aren't "better functions", they come with downsides, so if something is expressible as a function, it should.
let (lhs, rhs) = get_input_types(&$LHS_TYPE, &$OP, &$RHS_TYPE)?; | ||
assert_eq!(lhs, $RESULT_TYPE1); | ||
assert_eq!(rhs, $RESULT_TYPE2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have this
let (a, b) = get_input_types(&$LHS_TYPE, &$OP, &$RHS_TYPE)?;
assert_eq!(a, $RESULT_TYPE1);
assert_eq!(b, $RESULT_TYPE2);
// test symmetry
let (b, a) = get_input_types(&$RHS_TYPE, &$OP, &$LHS_TYPE)?;
assert_eq!(a, $RESULT_TYPE1);
assert_eq!(b, $RESULT_TYPE2);
} else if matches!( | ||
(&this_type, cast_to_type), | ||
(DataType::Int32 | DataType::Int64, DataType::Interval(_)) | ||
) { | ||
// Convert integer (days) to the corresponding DayTime interval | ||
match self { | ||
Expr::Literal(ScalarValue::Int32(Some(days))) => { | ||
Ok(Expr::Literal(ScalarValue::IntervalDayTime(Some( | ||
arrow_buffer::IntervalDayTime::new(days, 0), | ||
)))) | ||
} | ||
Expr::Literal(ScalarValue::Int64(Some(days))) => { | ||
Ok(Expr::Literal(ScalarValue::IntervalDayTime(Some( | ||
arrow_buffer::IntervalDayTime::new(days as i32, 0), | ||
)))) | ||
} | ||
_ => plan_err!( | ||
"Cannot automatically convert {this_type:?} to {cast_to_type:?}" | ||
), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a Cast is supposed to work, it should work regardless how it's weaved into the plan.
Expr.cast_to
should not have any additional logic (other than checks) vs creating Expr::Cast
directly.
However, we should first decide whether integer should be castable to interval at all.
It's not castable in PostgreSQL
postgres=# SELECT 10::interval;
ERROR: cannot cast type integer to interval
LINE 1: SELECT 10::interval;
^
so I am inclined it should not be castable in DF either.
Which means, the date + int
cannot be converted (desugared) to date + interval
using ordinary cast expression (as such shouldn't exist), but it can be converted (desugared) by some other means (eg with a custom conversion function or combination of expressions).
Alternatively, we defer date + int
until the execution, eg somewhere in
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> { |
but i'd prefer to desugar earlier and avoid separate code path in the
BinaryExpr
physical expr
FYI @jonahgao in case you would like to weigh in here |
What is the status of this PR? Are we waiting on upstream changes in arrow-rs? |
Which issue does this PR close?
Closes #12342
Rationale for this change
Details
To facilitate this change, I made an update to
cast_to
function inexpr_schema.rs
. There are two unpleasant things in here:select to_date('1970-01-03') - x from (select 1 as x);
The limitations I faced are these: