-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Coalesce
casting logic to follows what Postgres and DuckDB do. Introduce signature that do non-comparison coercion
#10268
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
datafusion/expr/src/signature.rs
Outdated
@@ -92,14 +92,19 @@ pub enum TypeSignature { | |||
/// A function such as `concat` is `Variadic(vec![DataType::Utf8, DataType::LargeUtf8])` | |||
Variadic(Vec<DataType>), | |||
/// One or more arguments of an arbitrary but equal type. | |||
/// DataFusion attempts to coerce all argument types to match the first argument's type | |||
/// DataFusion attempts to coerce all argument types to match to the common type with comparision coercion. | |||
/// | |||
/// # Examples | |||
/// Given types in signature should be coercible to the same final type. | |||
/// A function such as `make_array` is `VariadicEqual`. | |||
/// | |||
/// `make_array(i32, i64) -> make_array(i64, i64)` | |||
VariadicEqual, |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
// Note that not all rules in `comparison_coercion` can be reused here. | ||
// For example, all numeric types can be coerced into Utf8 for comparison, | ||
// but not for function arguments. | ||
_ => comparison_binary_numeric_coercion(type_into, type_from).and_then( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is introduced in #9459, so I think it is safe to remove together with this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @viirya
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
coalesce
, introduce signature VariadicEqualOrNull for coalesce
.Coalesce
casting logic that follows Postgres and DuckDB. Introduce signature that do non-comparison coercion
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
In general I am concerned about the potential downstream effects of this change. I don't fully understand them What I would ideally like to do is to run the influxdb_iox regression tests with this change and see what happens. I am not sure I have the time to do that in the next week -- maybe @appletreeisyellow does 🤔 |
Basically I worry this is just changing behavior rather than fixing a bug and will result in churn for no benefit downstream. I may be mis understanding the change and rationale however |
I believe the coercion rule is quite messy as it currently stands. It would be more understandable and maintainable to move the coercion rule from coerced_from to each individual function. This way, it would be clear which coercion rule applies to each function or signature. Having a single function handle multiple coercion rules makes the code harder to reason about and could cause downstream issues, as you have pointed out. I've also filed an issue to track this at @10507. Would it be better to start by removing the rule from coerce_from and moving it to the specific function or signature? |
@alamb I'm happy to coordinate with our performance team and run an influxdb_iox regression tests if needed |
I don't think this needs a performance test -- I was thinking just the main test suite |
Hi @jayzhan211 -- I am sorry I haven't had a chance to devote more time to this and review. I will try and find some time in the next few days |
I was thinking performance regression 😅 Got it! I can put up an Influxdata test against this branch, sometime this afternoon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update here is that @appletreeisyellow did run the internal influx db tests against this branch and we didn't find any problems. Not that it is required that we don't have to change our code, but I have found that our test suite has a bit broader coverage and thus the fact it doesn't have problems suggests to me this PR is less likely to have knock on effects
I had another look at this PR this morning and Other than the changes in select.slt I think it looks great to me
Thanks again @jayzhan211
@@ -1473,7 +1473,7 @@ DROP TABLE t; | |||
|
|||
# related to https://github.com/apache/datafusion/issues/8814 | |||
statement ok | |||
create table t(x int, y int) as values (1,1), (2,2), (3,3), (0,0), (4,0); | |||
create table t(x bigint, y bigint) as values (1,1), (2,2), (3,3), (0,0), (4,0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please keep the existing test and add a new test that uses bigint
so it is clearer what behavior is changing?
Signed-off-by: jayzhan211 <[email protected]>
@@ -1778,7 +1778,7 @@ AS VALUES | |||
('BB', 6, 1); | |||
|
|||
query TII | |||
select col1, col2, coalesce(sum_col3, 0) as sum_col3 | |||
select col1, col2, coalesce(sum_col3, arrow_cast(0, 'UInt64')) as sum_col3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also we can change this test back too
Signed-off-by: jayzhan211 <[email protected]>
Looks good, and existing tests are passing 👍 🚀 |
Thanks @alamb and @appletreeisyellow |
…Introduce signature that do non-comparison coercion (apache#10268) * remove casting for coalesce Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * crate only visibility Signed-off-by: jayzhan211 <[email protected]> * polish comment Signed-off-by: jayzhan211 <[email protected]> * improve test Signed-off-by: jayzhan211 <[email protected]> * backup Signed-off-by: jayzhan211 <[email protected]> * introduce new signautre for coalesce Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * ignore err msg Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * fix doc Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * switch to type_resolution coercion Signed-off-by: jayzhan211 <[email protected]> * fix i64 and u64 case Signed-off-by: jayzhan211 <[email protected]> * add more tests Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * add null case Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * fix Signed-off-by: jayzhan211 <[email protected]> * rename to type_union_resolution Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * rm test Signed-off-by: jayzhan211 <[email protected]> * cleanup since rebase Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix msg Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * rm pure_string_coercion Signed-off-by: jayzhan211 <[email protected]> * rm duplicate Signed-off-by: jayzhan211 <[email protected]> * change type in select.slt Signed-off-by: jayzhan211 <[email protected]> * fix slt Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
Thank you for sticking with it @jayzhan211 -- so much is going on!@ |
…Introduce signature that do non-comparison coercion (apache#10268) * remove casting for coalesce Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * crate only visibility Signed-off-by: jayzhan211 <[email protected]> * polish comment Signed-off-by: jayzhan211 <[email protected]> * improve test Signed-off-by: jayzhan211 <[email protected]> * backup Signed-off-by: jayzhan211 <[email protected]> * introduce new signautre for coalesce Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * ignore err msg Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * fix doc Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * switch to type_resolution coercion Signed-off-by: jayzhan211 <[email protected]> * fix i64 and u64 case Signed-off-by: jayzhan211 <[email protected]> * add more tests Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * add null case Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * fix Signed-off-by: jayzhan211 <[email protected]> * rename to type_union_resolution Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * cleanup Signed-off-by: jayzhan211 <[email protected]> * fix test Signed-off-by: jayzhan211 <[email protected]> * add comment Signed-off-by: jayzhan211 <[email protected]> * rm test Signed-off-by: jayzhan211 <[email protected]> * cleanup since rebase Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * add more test Signed-off-by: jayzhan211 <[email protected]> * fix msg Signed-off-by: jayzhan211 <[email protected]> * fmt Signed-off-by: jayzhan211 <[email protected]> * rm pure_string_coercion Signed-off-by: jayzhan211 <[email protected]> * rm duplicate Signed-off-by: jayzhan211 <[email protected]> * change type in select.slt Signed-off-by: jayzhan211 <[email protected]> * fix slt Signed-off-by: jayzhan211 <[email protected]> --------- Signed-off-by: jayzhan211 <[email protected]>
Which issue does this PR close?
Closes #10261
Closes #10241
Part of #10507
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?