-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle type coercion in signature for ApproxPercentileCont
#12274
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
SELECT approx_median(c6) FROM aggregate_test_100 | ||
---- | ||
1146409980542786560 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is due to the computation result of f64(? At least the code doesn't change the computation of approx_median
only cast them to f64
@@ -346,7 +346,7 @@ async fn test_fn_approx_median() -> Result<()> { | |||
"+-----------------------+", | |||
"| approx_median(test.b) |", | |||
"+-----------------------+", | |||
"| 10 |", | |||
"| 10.0 |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a change in behavior -- with this PR now median always returns float but before it returned the same type as its input
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I change the result to f64 now.
I think it is fine to have f64 for median value. I check the result of Duckdb, they have double for integer, although they have decimal for decimal input, but since we doesn't support decimal for approx_median so there is no regression. We could support decimal case later on
@@ -116,4 +112,8 @@ impl AggregateUDFImpl for ApproxMedian { | |||
acc_args.exprs[0].data_type(acc_args.schema)?, | |||
))) | |||
} | |||
|
|||
fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>> { | |||
Ok(vec![DataType::Float64]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC, why this instead of just defining the signature as DataType::Float64? Afaik DF already tries to coerce inputs to the signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are referring to coerced_from
? I want to deprecate that function because the downside of having single truth of coercion make updating the coercion rule unpredictable, easy to cause bug without notice (hard to have full test coverage too). The new approach is to handle the coercion by signature.
I think I could change the signature to Coercible(vec![Float64])
#12275
…ox-perc-cont-signature
Signed-off-by: jayzhan211 <[email protected]>
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #.
Part of #10507
Rationale for this change
Before this change,
ApproxPercentileCont
has to deal with different types while creating accumulator, now they just need to deal with final coerced typeApprox_median
now always return Double type what ever the input is.The reason is that in DuckDB they returns double except for decimal. And,
APPROX_PERCENTILE
, the underlying function forapprox_median
returns double too.In snowflake they returns Double for
APPROX_PERCENTILE
too.https://docs.snowflake.com/en/sql-reference/functions/approx_percentile
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?