Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support n-ary monotonic functions in discover_new_orderings #44

Conversation

gokselk
Copy link

@gokselk gokselk commented Nov 7, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@berkaysynnada
Copy link
Collaborator

I'm sorry for the delayed response @gokselk. I had to deal with some urgent maintenance issues. The refactor looks promising overall, but I noticed some issues in the tests that indicate potential bugs.

In your first test, the table is ordered as [a+b, a, b], and you add the equality c = a + b. This results in new orderings of [a+b] and [a, b]. I believe this is incorrect—you can find a simple counter-example if this isn’t immediately clear. The initial orderings should remain unchanged (or potentially be updated to [c, a, b] but this is not our concern).

The correct test case should be as follows: start with an initial ordering of [c, a, b], then add the equality c = a + b. The final orderings should then be [c] and [a, b]. I tested this scenario with your implementation, and it works as expected (it does not work without your refactor). However, the current test should not pass, as the implementation seems to perform unintended simplifications. You need to identify and address this bug.

Once that’s fixed, it would be beneficial to add more scenarios. Consider using ScalarFunctionExpr's as PhysicalExpr to represent different mathematical functions.

@berkaysynnada
Copy link
Collaborator

berkaysynnada commented Nov 13, 2024

You also need to resolve the conflicts coming from upstream to your branch.

@gokselk gokselk force-pushed the feature/support-n-ary-monotonic-fns branch from d631a11 to 04c1feb Compare November 19, 2024 06:48
@gokselk
Copy link
Author

gokselk commented Nov 21, 2024

In your first test, the table is ordered as [a+b, a, b], and you add the equality c = a + b. This results in new orderings of [a+b] and [a, b]. I believe this is incorrect—you can find a simple counter-example if this isn’t immediately clear. The initial orderings should remain unchanged (or potentially be updated to [c, a, b] but this is not our concern).

The correct test case should be as follows: start with an initial ordering of [c, a, b], then add the equality c = a + b. The final orderings should then be [c] and [a, b]. I tested this scenario with your implementation, and it works as expected (it does not work without your refactor). However, the current test should not pass, as the implementation seems to perform unintended simplifications. You need to identify and address this bug.

Thanks for catching this. I agree the original test case was incorrect. I'm debugging the unintended simplification behavior with the ordering [a+b, a, b]. Will update once I identify and fix the issue.

For now, I've updated the test to use the correct case: [c, a, b] -> [c] and [a, b].

@gokselk
Copy link
Author

gokselk commented Nov 22, 2024

@berkaysynnada I've pushed a potential fix that avoids unintended simplification by skipping cases where the original ordering starts with the equivalent expression.

However, I'm not entirely confident about this approach and would appreciate your review on whether this is the right way to handle these cases.

Copy link
Collaborator

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concern about the last test. It should behave in the same way with the first test, but results in a different state. To clear this up and discussing some ways to simplify implementation, do you want to set a meeting ASAP? We can also talk about the last task.

However, you can open this also to the upstream. It is clear that it brings an improvement, and we can continue the discussion there.

@gokselk
Copy link
Author

gokselk commented Dec 17, 2024

@berkaysynnada

I've implemented the changes we discussed:

  • Added preserves_lex_ordering functionality
  • Updated test cases to use concat instead of addition.

Please review when you have a chance. Let me know if anything else needs to be addressed.

Copy link
Collaborator

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After you change the API as the way I suggest, we will not have to duplicate the order checking logic.

Did you also find a chance to investigate other functions? It will be better to have one more lex order preserver function

datafusion/functions/src/string/concat.rs Outdated Show resolved Hide resolved
datafusion/functions/src/string/concat.rs Outdated Show resolved Hide resolved
datafusion/expr/src/udf.rs Outdated Show resolved Hide resolved
datafusion/physical-expr/Cargo.toml Outdated Show resolved Hide resolved
@gokselk
Copy link
Author

gokselk commented Dec 17, 2024

Did you also find a chance to investigate other functions? It will be better to have one more lex order preserver function

I'll investigate string manipulation functions (substr, case conversions, padding) and date formats as potential candidates for lex order preservation. Will update once I have concrete findings.

@berkaysynnada
Copy link
Collaborator

If the test pass successfully, let's open this to the upstream @gokselk

@gokselk gokselk force-pushed the feature/support-n-ary-monotonic-fns branch from 9425569 to f147a4e Compare December 19, 2024 10:17
@ozankabak
Copy link
Collaborator

Merged upstream.

@ozankabak ozankabak closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants