-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CardinalityAwareRowConverter
while doing streaming merge
#7401
Conversation
CardinalityAwareRow
converter while doing streaming merge
CardinalityAwareRow
converter while doing streaming mergeCardinalityAwareRowConverter
while doing streaming merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, I think it would also be good to check the benchmarks to see how this impacts performance
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs
Outdated
Show resolved
Hide resolved
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs
Outdated
Show resolved
Hide resolved
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs
Outdated
Show resolved
Hide resolved
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs
Outdated
Show resolved
Hide resolved
datafusion/core/src/physical_plan/sorts/sort_preserving_merge.rs
Outdated
Show resolved
Hide resolved
I plan to review this PR shortly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JayjeetAtGithub -- this PR is looking close. I left a few comments and there appears to be a CI test failure and clippy issues that need to be resolved.
Also, is there an end to end test you can add showing how this PR avoids a memory explosion while merging?
In order to move this PR along, I plan to work on it later this morning. I'll plan to push some changes |
@alamb I am working on testing out the wrapper in IOx over the OOM'ed Jaegar queries and adding some changes necessary to make that work. |
* Try to add cardinality aware row converter in df * Move CardinalityAwareRowConverter in df
Co-authored-by: Andrew Lamb <[email protected]>
Update: I think this PR is ready to go except for figuring out what the proper value for the LOW_CARDINALITY cutoff is I believe @tustvold is checking setting it to zero via apache/arrow-rs#4811. @JayjeetAtGithub can you look into seeing what the performance threshold is? The idea would be to test the performance of merge on a column of different cardinalities -- maybe cardinality |
Sure, I can do that |
Row conversion duration vs cardinality for dict preserving on/off. The absolute numbers are in microseconds. See sheet. This chart shows the durations (in microseconds) taken to convert a From the graph, it looks like |
I am going to resolve the merge conflicts and get this PR ready to go |
@tustvold and I talked about this PR -- it is likely that when arrow-rs 47 is released (with apache/arrow-rs#4819) this code will become outdated, we think it is an improvement over the current main. Thus I plan to merge this PR when the tests pass |
Which issue does this PR close?
Closes #7200.
Rationale for this change
This PR uses the
CardinalityAwareRowConverter
implemented inarrow-row
in this PR.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?