-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[minor]: Update median implementation #13554
Conversation
where | ||
T: ArrowPrimitiveType, | ||
T::Native: PartialOrd, // Ensure the type supports PartialOrd for comparison | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As unwrap
is used, the code could be updated / simplified to work on arrays of size >= 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me, updated the implementation such that function doesn't return option in 0303c9
array | ||
.iter() | ||
.fold(array[0], |max, &val| if max > val { max } else { val }) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should work
array | |
.iter() | |
.fold(array[0], |max, &val| if max > val { max } else { val }) | |
} | |
array | |
.into_iter() | |
.max() | |
.unwrap() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I did this I got following error:
^^^ the trait `std::cmp::Ord` is not implemented for `<T as arrow::array::ArrowPrimitiveType>::Native`, which is required by `&<T as arrow::array::ArrowPrimitiveType>::Native: std::cmp::Ord`
I guess, max expects Ord
trait. However, T::Native
implements PartialOrd
trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I could do this using max_by
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use Ord
though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we can, at least I couldn't make it work. However, maybe I am missing something. By the way, following is the trait description for ArrowPrimitiveType::Native
pub trait ArrowNativeType:
std::fmt::Debug + Send + Sync + Copy + PartialOrd + Default + private::Sealed + 'static
{
and signature of max
is as follows:
fn max(self) -> Option<Self::Item>
where
Self: Sized,
Self::Item: Ord,
{
self.max_by(Ord::cmp)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @akurmustafa and @Dandandan for the review
Which issue does this PR close?
Improves the situation in 13550
Rationale for this change
While examining the median code to see where can be changed to improve performance. I recognized that while calculating
the median for slices with even number of entries, we call
select_nth_unstable_by
twice. First to bipartition slice according to indexn/2
. Then to bipartition first part by indexn/2-1
. The second bipartitioning is indeed maximum calculation.This PR changes the second bipartitioning call with maximum. I am not sure, whether this will improve the performance but I think at the worst case performance will be same, and this version is slightly more readable.
What changes are included in this PR?
Are these changes tested?
Existing tests
Are there any user-facing changes?