Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[minor]: Update median implementation #13554

Merged
merged 7 commits into from
Nov 29, 2024

Conversation

akurmustafa
Copy link
Contributor

Which issue does this PR close?

Improves the situation in 13550

Rationale for this change

While examining the median code to see where can be changed to improve performance. I recognized that while calculating
the median for slices with even number of entries, we call select_nth_unstable_by twice. First to bipartition slice according to index n/2. Then to bipartition first part by index n/2-1. The second bipartitioning is indeed maximum calculation.

This PR changes the second bipartitioning call with maximum. I am not sure, whether this will improve the performance but I think at the worst case performance will be same, and this version is slightly more readable.

What changes are included in this PR?

Are these changes tested?

Existing tests

Are there any user-facing changes?

where
T: ArrowPrimitiveType,
T::Native: PartialOrd, // Ensure the type supports PartialOrd for comparison
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As unwrap is used, the code could be updated / simplified to work on arrays of size >= 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, updated the implementation such that function doesn't return option in 0303c9

Comment on lines 321 to 324
array
.iter()
.fold(array[0], |max, &val| if max > val { max } else { val })
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work

Suggested change
array
.iter()
.fold(array[0], |max, &val| if max > val { max } else { val })
}
array
.into_iter()
.max()
.unwrap()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I did this I got following error:

^^^ the trait `std::cmp::Ord` is not implemented for `<T as arrow::array::ArrowPrimitiveType>::Native`, which is required by `&<T as arrow::array::ArrowPrimitiveType>::Native: std::cmp::Ord`

I guess, max expects Ord trait. However, T::Native implements PartialOrd trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I could do this using max_by API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use Ord though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we can, at least I couldn't make it work. However, maybe I am missing something. By the way, following is the trait description for ArrowPrimitiveType::Native

pub trait ArrowNativeType:
    std::fmt::Debug + Send + Sync + Copy + PartialOrd + Default + private::Sealed + 'static
{

and signature of max is as follows:

    fn max(self) -> Option<Self::Item>
    where
        Self: Sized,
        Self::Item: Ord,
    {
        self.max_by(Ord::cmp)
    }

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @akurmustafa and @Dandandan for the review

@akurmustafa akurmustafa merged commit 8ae36b7 into apache:main Nov 29, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants