Improve BitonicSort performance for sorting floats #952

vyu · 2021-08-20T05:04:37Z

Also: add docstring, extend tests, and add benchmarks.

The speedup (for sorting isbits floats under isless) is based on @stev47's improvement to isless in Julia 1.7 (stev47 also authored the BitonicSort implementation!), but instead of NaN-checking we subtract an offset to wrap the NaNs with negative sign to the greatest integers (see code for details), giving branchless code. We apply this transformation only once at the start of each sort to sort them as integers, then apply the inverse transformation at the end to get back the floats. (This technique cannot be used for isless in Julia Base because isless(NaN, NaN) returns false regardless of payload and sign.)

Comparative benchmark results on Julia 1.7.0-beta3:

x86-64 (Intel Haswell, mobile)
x86-64 (Intel Silvermont, microserver)
AArch64 (Arm Neoverse N1, Ampere A1 instance on Oracle Cloud)

(The couple of timing fluctuations that show up for integer sorting are just noise.)

I have a vectorized implementation that I hope to contribute later after cleaning it up. (SIMD performance is of course heavily dependent on CPU capabilities and compiler support, which is why I'm benchmarking on three machines. Didn't matter for this PR, though.)

Additional changes to sort.jl: - Put sort.jl into a module to avoid polluting StaticArrays namespace. - Add docstring. - Extend tests. - Add benchmarks.

stev47

The transformation by wrapping -NaN around is indeed a nice idea! I've added some quick comments on your implementation.
The separate benchmark seems quite extensive, but I haven't looked at it in detail.

src/sort.jl

test/sort.jl

src/sort.jl

test/sort.jl

vyu · 2021-08-22T12:30:32Z

@stev47 Thank you for the review. I've made some changes based on your feedback.

Instead of using generated functions to unroll tuples, use `ntuple(f, Val(N))` where `f` has an inline hint.

stev47 · 2021-09-06T18:05:21Z

It is a bit regrettable that we now have a different _sort api than Base, but I have no quick idea around it.
It looks good to me.

vyu · 2021-10-06T02:49:00Z

Could a maintainer approve the pending workflows? This is ready for merging.

vyu added 2 commits August 20, 2021 03:35

Add benchmarks for BitonicSort

adce244

Improve BitonicSort performance for sorting floats

80938ef

Additional changes to sort.jl: - Put sort.jl into a module to avoid polluting StaticArrays namespace. - Add docstring. - Extend tests. - Add benchmarks.

stev47 reviewed Aug 21, 2021

View reviewed changes

Implement suggestions from review and clean up.

be0e659

vyu added 2 commits August 22, 2021 13:11

"lt" => "isless"

0c75086

Replace generated functions

24252e5

Instead of using generated functions to unroll tuples, use `ntuple(f, Val(N))` where `f` has an inline hint.

Merge branch 'JuliaArrays:master' into master

a8cf0d8

Merge branch 'JuliaArrays:master' into master

e818dc2

vyu mentioned this pull request Jun 16, 2022

Improve middle(::AbstractRange) performance JuliaStats/Statistics.jl#116

Merged

stev47 mentioned this pull request Oct 28, 2024

Sorting could be much faster #1254

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve BitonicSort performance for sorting floats #952

Improve BitonicSort performance for sorting floats #952

vyu commented Aug 20, 2021 •

edited

Loading

stev47 left a comment

vyu commented Aug 22, 2021

stev47 commented Sep 6, 2021

vyu commented Oct 6, 2021

Improve BitonicSort performance for sorting floats #952

Are you sure you want to change the base?

Improve BitonicSort performance for sorting floats #952

Conversation

vyu commented Aug 20, 2021 • edited Loading

stev47 left a comment

Choose a reason for hiding this comment

vyu commented Aug 22, 2021

stev47 commented Sep 6, 2021

vyu commented Oct 6, 2021

vyu commented Aug 20, 2021 •

edited

Loading