Support n-ary monotonic functions in discover_new_orderings #44

gokselk · 2024-11-07T08:05:04Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

berkaysynnada · 2024-11-13T10:48:05Z

I'm sorry for the delayed response @gokselk. I had to deal with some urgent maintenance issues. The refactor looks promising overall, but I noticed some issues in the tests that indicate potential bugs.

In your first test, the table is ordered as [a+b, a, b], and you add the equality c = a + b. This results in new orderings of [a+b] and [a, b]. I believe this is incorrect—you can find a simple counter-example if this isn’t immediately clear. The initial orderings should remain unchanged (or potentially be updated to [c, a, b] but this is not our concern).

The correct test case should be as follows: start with an initial ordering of [c, a, b], then add the equality c = a + b. The final orderings should then be [c] and [a, b]. I tested this scenario with your implementation, and it works as expected (it does not work without your refactor). However, the current test should not pass, as the implementation seems to perform unintended simplifications. You need to identify and address this bug.

Once that’s fixed, it would be beneficial to add more scenarios. Consider using ScalarFunctionExpr's as PhysicalExpr to represent different mathematical functions.

berkaysynnada · 2024-11-13T10:49:16Z

You also need to resolve the conflicts coming from upstream to your branch.

gokselk · 2024-11-21T09:53:31Z

In your first test, the table is ordered as [a+b, a, b], and you add the equality c = a + b. This results in new orderings of [a+b] and [a, b]. I believe this is incorrect—you can find a simple counter-example if this isn’t immediately clear. The initial orderings should remain unchanged (or potentially be updated to [c, a, b] but this is not our concern).

The correct test case should be as follows: start with an initial ordering of [c, a, b], then add the equality c = a + b. The final orderings should then be [c] and [a, b]. I tested this scenario with your implementation, and it works as expected (it does not work without your refactor). However, the current test should not pass, as the implementation seems to perform unintended simplifications. You need to identify and address this bug.

Thanks for catching this. I agree the original test case was incorrect. I'm debugging the unintended simplification behavior with the ordering [a+b, a, b]. Will update once I identify and fix the issue.

For now, I've updated the test to use the correct case: [c, a, b] -> [c] and [a, b].

gokselk · 2024-11-22T07:39:21Z

@berkaysynnada I've pushed a potential fix that avoids unintended simplification by skipping cases where the original ordering starts with the equivalent expression.

However, I'm not entirely confident about this approach and would appreciate your review on whether this is the right way to handle these cases.

berkaysynnada

I have some concern about the last test. It should behave in the same way with the first test, but results in a different state. To clear this up and discussing some ways to simplify implementation, do you want to set a meeting ASAP? We can also talk about the last task.

However, you can open this also to the upstream. It is clear that it brings an improvement, and we can continue the discussion there.

gokselk · 2024-12-17T06:31:26Z

@berkaysynnada

I've implemented the changes we discussed:

Added preserves_lex_ordering functionality
Updated test cases to use concat instead of addition.

Please review when you have a chance. Let me know if anything else needs to be addressed.

berkaysynnada

After you change the API as the way I suggest, we will not have to duplicate the order checking logic.

Did you also find a chance to investigate other functions? It will be better to have one more lex order preserver function

datafusion/functions/src/string/concat.rs

datafusion/expr/src/udf.rs

datafusion/physical-expr/Cargo.toml

gokselk · 2024-12-17T13:07:20Z

Did you also find a chance to investigate other functions? It will be better to have one more lex order preserver function

I'll investigate string manipulation functions (substr, case conversions, padding) and date formats as potential candidates for lex order preservation. Will update once I have concrete findings.

datafusion/expr/src/udf.rs

berkaysynnada · 2024-12-18T08:47:14Z

If the test pass successfully, let's open this to the upstream @gokselk

…ions for `ConcatFunc`

ozankabak · 2024-12-20T09:53:38Z

Merged upstream.

github-actions bot added the physical-expr label Nov 7, 2024

gokselk force-pushed the feature/support-n-ary-monotonic-fns branch from d631a11 to 04c1feb Compare November 19, 2024 06:48

berkaysynnada reviewed Dec 8, 2024

View reviewed changes

github-actions bot added logical-expr functions labels Dec 17, 2024

berkaysynnada reviewed Dec 17, 2024

View reviewed changes

datafusion/functions/src/string/concat.rs Outdated Show resolved Hide resolved

datafusion/functions/src/string/concat.rs Outdated Show resolved Hide resolved

datafusion/expr/src/udf.rs Outdated Show resolved Hide resolved

datafusion/physical-expr/Cargo.toml Outdated Show resolved Hide resolved

berkaysynnada reviewed Dec 17, 2024

View reviewed changes

datafusion/expr/src/udf.rs Outdated Show resolved Hide resolved

gokselk added 16 commits December 19, 2024 13:17

Support n-ary monotonic functions in discover_new_orderings

8293c11

Add tests for n-ary monotonic functions in discover_new_orderings

2f7f74f

Fix tests

990922b

Fix non-monotonic test case

5985578

Fix unintended simplification

fab0924

Minor comment changes

e31e136

Fix tests

8cdb771

Add preserves_lex_ordering field

7f46cf6

Use preserves_lex_ordering on discover_new_orderings()

705e459

Add output_ordering and output_preserves_lex_ordering implementat…

39d6fbd

…ions for `ConcatFunc`

Update tests

7c0f4eb

Move logic to UDF

160eb8a

Cargo fmt

eb34b7e

Refactor

c8a90bc

Cargo fmt

1d1488d

Simply use false value on default implementation

a63c53f

gokselk added 3 commits December 19, 2024 13:17

Remove unnecessary import

b87b5fb

Clippy fix

5d12cfd

Update Cargo.lock

f147a4e

gokselk force-pushed the feature/support-n-ary-monotonic-fns branch from 9425569 to f147a4e Compare December 19, 2024 10:17

gokselk and others added 3 commits December 19, 2024 14:55

Move dep to dev-dependencies

1799e48

Rename output_preserves_lex_ordering to preserves_lex_ordering

91c7012

minor

5448426

ozankabak closed this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support n-ary monotonic functions in discover_new_orderings #44

Support n-ary monotonic functions in discover_new_orderings #44

gokselk commented Nov 7, 2024

berkaysynnada commented Nov 13, 2024

berkaysynnada commented Nov 13, 2024 •

edited

Loading

gokselk commented Nov 21, 2024

gokselk commented Nov 22, 2024

berkaysynnada left a comment

gokselk commented Dec 17, 2024 •

edited

Loading

berkaysynnada left a comment

gokselk commented Dec 17, 2024

berkaysynnada commented Dec 18, 2024

ozankabak commented Dec 20, 2024

Support n-ary monotonic functions in discover_new_orderings #44

Support n-ary monotonic functions in discover_new_orderings #44

Conversation

gokselk commented Nov 7, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

berkaysynnada commented Nov 13, 2024

berkaysynnada commented Nov 13, 2024 • edited Loading

gokselk commented Nov 21, 2024

gokselk commented Nov 22, 2024

berkaysynnada left a comment

Choose a reason for hiding this comment

gokselk commented Dec 17, 2024 • edited Loading

berkaysynnada left a comment

Choose a reason for hiding this comment

gokselk commented Dec 17, 2024

berkaysynnada commented Dec 18, 2024

ozankabak commented Dec 20, 2024

berkaysynnada commented Nov 13, 2024 •

edited

Loading

gokselk commented Dec 17, 2024 •

edited

Loading