Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle type coercion in signature for ApproxPercentileCont #12274

Closed
wants to merge 4 commits into from

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented Sep 1, 2024

Which issue does this PR close?

Closes #.

Part of #10507

Rationale for this change

Before this change, ApproxPercentileCont has to deal with different types while creating accumulator, now they just need to deal with final coerced type

Approx_median now always return Double type what ever the input is.
The reason is that in DuckDB they returns double except for decimal. And, APPROX_PERCENTILE, the underlying function for approx_median returns double too.

In snowflake they returns Double for APPROX_PERCENTILE too.
https://docs.snowflake.com/en/sql-reference/functions/approx_percentile

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions labels Sep 1, 2024
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the core Core DataFusion crate label Sep 1, 2024
SELECT approx_median(c6) FROM aggregate_test_100
----
1146409980542786560
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is due to the computation result of f64(? At least the code doesn't change the computation of approx_median only cast them to f64

@jayzhan211 jayzhan211 marked this pull request as ready for review September 1, 2024 08:01
@@ -346,7 +346,7 @@ async fn test_fn_approx_median() -> Result<()> {
"+-----------------------+",
"| approx_median(test.b) |",
"+-----------------------+",
"| 10 |",
"| 10.0 |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a change in behavior -- with this PR now median always returns float but before it returned the same type as its input

This comment was marked as outdated.

Copy link
Contributor Author

@jayzhan211 jayzhan211 Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I change the result to f64 now.

I think it is fine to have f64 for median value. I check the result of Duckdb, they have double for integer, although they have decimal for decimal input, but since we doesn't support decimal for approx_median so there is no regression. We could support decimal case later on

@@ -116,4 +112,8 @@ impl AggregateUDFImpl for ApproxMedian {
acc_args.exprs[0].data_type(acc_args.schema)?,
)))
}

fn coerce_types(&self, _arg_types: &[DataType]) -> Result<Vec<DataType>> {
Ok(vec![DataType::Float64])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOC, why this instead of just defining the signature as DataType::Float64? Afaik DF already tries to coerce inputs to the signature

Copy link
Contributor Author

@jayzhan211 jayzhan211 Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are referring to coerced_from? I want to deprecate that function because the downside of having single truth of coercion make updating the coercion rule unpredictable, easy to cause bug without notice (hard to have full test coverage too). The new approach is to handle the coercion by signature.

I think I could change the signature to Coercible(vec![Float64]) #12275

@jayzhan211 jayzhan211 marked this pull request as draft September 4, 2024 11:34
Copy link

github-actions bot commented Nov 4, 2024

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Nov 4, 2024
@github-actions github-actions bot closed this Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate functions sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants