Re-write median(Binomial) to improve speed #1928

marcusps · 2024-12-15T06:09:56Z

Fix #1926 by following suggestions therein to reduce avg number of cdf(Binomial) evaluations.

Follow suggestion from JuliaStats#1926 to reduce `cdf(Binomial)` evaluations.

codecov-commenter · 2024-12-15T06:23:56Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.01%. Comparing base (790411a) to head (a422fdb).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1928      +/-   ##
==========================================
+ Coverage   85.99%   86.01%   +0.01%     
==========================================
  Files         144      144              
  Lines        8684     8696      +12     
==========================================
+ Hits         7468     7480      +12     
  Misses       1216     1216

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

andreasnoack

I think it might be useful to link to the paper in a comment. It would make it easier to reason about the implementation for a future reader of the source.

src/univariate/discrete/binomial.jl

Co-authored-by: Andreas Noack <[email protected]>

Co-authored-by: David Widmann <[email protected]>

Co-authored-by: Andreas Noack <[email protected]>

Co-authored-by: David Widmann <[email protected]>

marcusps · 2024-12-15T21:53:16Z

Following the suggestions, given

rv = Binomial(25,0.3)
rv2 = Binomial(25,0.756)

the previous version yielded

julia> @benchmark median(rv)
BenchmarkTools.Trial: 10000 samples with 33 evaluations.
 Range (min … max):  924.576 ns …   9.896 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     929.455 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   980.001 ns ± 151.139 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅               ▁     ▁   ▄   ▃                              ▁
  ██▅▇▇▇▇▅▇▅▃▇▄▅▇▁▃█▃██▁▁█▇▅███▃▆█▇▅▃█▆▃▄▇▄▁▁▄█▅▁▃▃█▅▄▁▅▇▆▄▁▁▃▇ █
  925 ns        Histogram: log(frequency) by time       1.49 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark median(rv2)
BenchmarkTools.Trial: 10000 samples with 26 evaluations.
 Range (min … max):  941.346 ns …  13.187 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     945.231 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.007 μs ± 187.974 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █               ▁   ▂  ▂   ▄   ▃                              ▁
  █▇█▅▅█▅▆▇▄█▅▁▇▅▃██▁▃█▃▁█▅▁▃█▃▁▄█▅▆▇█▄▅▅█▇▄▃▁█▄▁▃▃█▃▁▁▁▆█▁▄▄▁█ █
  941 ns        Histogram: log(frequency) by time       1.51 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

and the version version yields

julia> @benchmark median(rv)
BenchmarkTools.Trial: 10000 samples with 469 evaluations.
 Range (min … max):  225.928 ns … 557.132 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     228.358 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   235.036 ns ±  21.306 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▂▆▇▁                          ▁▁   ▁                        ▂
  ████████▅▅▅▇▇▆▇▅█▇▄▄█▆▄▅▆▅▄▄▅▅▄▄██▆▅▄█▇▅▅▄▇▅▃▄▂▄▄▃▄▄▄▆▄▃▄▄▅▅▄ █
  226 ns        Histogram: log(frequency) by time        340 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark median(rv2)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  17.037 ns … 52.826 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.434 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.582 ns ±  2.704 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅██▇▇▅▆▆▅▃     ▁  ▁▂  ▁▁▃▂▄▄▄▃▃▂▃▂ ▁▁                     ▂
  ▄▇██████████▇█▇▄███▇██▇▇███████████████▇▆▇▇▇▆▇▇▇▄▆▇▃▇▆▅█▂▇▆ █
  17 ns        Histogram: log(frequency) by time        29 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I will check how much the minmax suggestion helps.

marcusps · 2024-12-16T00:11:57Z

Here is a more thorough comparison. To get full coverage across the branches, I extended the benchmarks to

rv = Binomial(25,0.3)
rv2 = Binomial(25,0.756)
rv3 = Binomial(25,1//2)
rv4 = Binomial(25, 3//5)

and then ran bitj = @benchmark median(rvj) where i=1 represents the current (buggy) implementation of the median, i=2 is the current (merged) implementation that fixes the bug, and i=3 incorporates something akin to minmax (but still using only min), which looks like this:

function median(dist::Binomial)
    bound = min(dist.p, 1-dist.p) # from http://dx.doi.org/10.1111/j.1467-9574.1980.tb00681.x
    dist_mean = mean(dist)
    
    floor_mean = floor(Int, dist_mean)
    difference = dist_mean - floor_mean
    
    if difference < bound
        floor_mean
    elseif  difference > 1 - bound
        floor_mean + 1
    elseif cdf(dist, floor_mean) >= 0.5
        floor_mean
    else
        floor_mean + 1
    end
end

The results are

julia> for (a,b) in zip(b2,b3)
         println(BenchmarkTools.judge(median(a),median(b)))
       end
TrialJudgement(+0.40% => invariant)
TrialJudgement(-9.69% => improvement)
TrialJudgement(-0.04% => invariant)
TrialJudgement(-3.12% => invariant)

julia> for (a,b) in zip(b2,b3)
         println(BenchmarkTools.judge(var(a),var(b)))
       end
TrialJudgement(+12.14% => regression)
TrialJudgement(-49.32% => improvement)
TrialJudgement(-22.95% => improvement)
TrialJudgement(-12.07% => improvement)

which means the simplification is a definite improvement, both for the median and variance.

The new bug free implementations are still much slower than the buggy implementation, but that feels like an unfair comparison.

julia> for (a,b) in zip(b1,b2)
         println(BenchmarkTools.judge(median(a),median(b)))
       end
TrialJudgement(+308.74% => regression)
TrialJudgement(+5373.54% => regression)
TrialJudgement(+343.01% => regression)
TrialJudgement(+3152.94% => regression)

julia> for (a,b) in zip(b1,b3)
         println(BenchmarkTools.judge(median(a),median(b)))
       end
TrialJudgement(+310.37% => regression)
TrialJudgement(+4843.23% => regression)
TrialJudgement(+342.83% => regression)
TrialJudgement(+3051.34% => regression)

src/univariate/discrete/binomial.jl

Co-authored-by: David Widmann <[email protected]>

test/univariate/discrete/binomial.jl

devmotion · 2024-12-16T11:58:02Z

I see the following difference to the master branch:

This PR

julia> using Distributions, Chairmarks

julia> @be median($(Binomial(44, 0.1)))
Benchmark: 3317 samples with 186 evaluations
 min    135.301 ns
 median 152.554 ns
 mean   153.703 ns
 max    250.892 ns

julia> @be median($(Binomial(40, 0.1)))
Benchmark: 4932 samples with 4265 evaluations
 min    3.976 ns
 median 4.298 ns
 mean   4.556 ns
 max    9.066 ns

julia> @be median($(Binomial(25, 0.3)))
Benchmark: 3891 samples with 147 evaluations
 min    137.755 ns
 median 157.592 ns
 mean   163.907 ns
 max    284.014 ns

julia> @be median($(Binomial(25, 0.756)))
Benchmark: 4933 samples with 3917 evaluations
 min    4.266 ns
 median 4.606 ns
 mean   4.900 ns
 max    21.966 ns

julia> @be median($(Binomial(25, 0.5)))
Benchmark: 4809 samples with 4545 evaluations
 min    3.988 ns
 median 4.171 ns
 mean   4.351 ns
 max    9.332 ns

julia> @be median($(Binomial(25, 0.6)))
Benchmark: 4921 samples with 4227 evaluations
 min    3.982 ns
 median 4.298 ns
 mean   4.538 ns
 max    9.148 ns

Master branch

julia> using Distributions, Chairmarks

julia> @be median($(Binomial(44, 0.1)))
Benchmark: 3387 samples with 125 evaluations
 min    214.664 ns
 median 215.672 ns
 mean   222.091 ns
 max    366.672 ns

julia> @be median($(Binomial(40, 0.1)))
Benchmark: 4111 samples with 86 evaluations
 min    239.826 ns
 median 253.395 ns
 mean   265.475 ns
 max    431.686 ns

julia> @be median($(Binomial(25, 0.3)))
Benchmark: 3938 samples with 101 evaluations
 min    215.347 ns
 median 225.663 ns
 mean   234.931 ns
 max    417.901 ns

julia> @be median($(Binomial(25, 0.756)))
Benchmark: 3951 samples with 94 evaluations
 min    225.617 ns
 median 242.904 ns
 mean   250.102 ns
 max    419.777 ns

julia> @be median($(Binomial(25, 0.5)))
Benchmark: 3920 samples with 100 evaluations
 min    216.250 ns
 median 227.080 ns
 mean   237.701 ns
 max    419.170 ns

julia> @be median($(Binomial(25, 0.6)))
Benchmark: 3643 samples with 105 evaluations
 min    217.057 ns
 median 233.733 ns
 mean   245.306 ns
 max    732.143 ns

Co-authored-by: David Widmann <[email protected]>

Speed up median(Binomial)

6b4fb3c

Follow suggestion from JuliaStats#1926 to reduce `cdf(Binomial)` evaluations.

andreasnoack reviewed Dec 15, 2024

View reviewed changes

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

devmotion reviewed Dec 15, 2024

View reviewed changes

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

devmotion reviewed Dec 15, 2024

View reviewed changes

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

marcusps and others added 5 commits December 15, 2024 20:40

Simplify bound condition

454a68d

Co-authored-by: Andreas Noack <[email protected]>

Fix name of argument variable

a8462a1

Co-authored-by: David Widmann <[email protected]>

Simplify b

e721043

Co-authored-by: Andreas Noack <[email protected]>

Fix cummulative probability check

d91cd7d

Co-authored-by: David Widmann <[email protected]>

Incorporate suggestions from PR review

b4647c2

marcusps added 3 commits December 16, 2024 00:12

Further simplifications following suggestions

59ff74c

Add more tests

0d964fd

Fixed checks for median bounds

efaa51d

devmotion reviewed Dec 16, 2024

View reviewed changes

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

devmotion reviewed Dec 16, 2024

View reviewed changes

src/univariate/discrete/binomial.jl Show resolved Hide resolved

devmotion reviewed Dec 16, 2024

View reviewed changes

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

marcusps and others added 4 commits December 16, 2024 02:53

Simplified tests a bit

c88c447

Update src/univariate/discrete/binomial.jl

daf038d

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/discrete/binomial.jl

04bd121

Co-authored-by: David Widmann <[email protected]>

Update src/univariate/discrete/binomial.jl

70763cc

Co-authored-by: David Widmann <[email protected]>

marcusps changed the title ~~Fix #1926~~ Speed up median(Binomial) Dec 16, 2024

marcusps changed the title ~~Speed up median(Binomial)~~ Re-write median(Binomial) to improve speed Dec 16, 2024

devmotion reviewed Dec 16, 2024

View reviewed changes

test/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved

Avoid test memory allocation with an iterator

a422fdb

Co-authored-by: David Widmann <[email protected]>

devmotion approved these changes Dec 17, 2024

View reviewed changes

devmotion merged commit 445c2fe into JuliaStats:master Dec 17, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-write median(Binomial) to improve speed #1928

Re-write median(Binomial) to improve speed #1928

marcusps commented Dec 15, 2024 •

edited

Loading

codecov-commenter commented Dec 15, 2024 •

edited

Loading

andreasnoack left a comment

marcusps commented Dec 15, 2024

marcusps commented Dec 16, 2024

devmotion commented Dec 16, 2024

Re-write median(Binomial) to improve speed #1928

Re-write median(Binomial) to improve speed #1928

Conversation

marcusps commented Dec 15, 2024 • edited Loading

codecov-commenter commented Dec 15, 2024 • edited Loading

Codecov Report

andreasnoack left a comment

Choose a reason for hiding this comment

marcusps commented Dec 15, 2024

marcusps commented Dec 16, 2024

devmotion commented Dec 16, 2024

This PR

Master branch

marcusps commented Dec 15, 2024 •

edited

Loading

codecov-commenter commented Dec 15, 2024 •

edited

Loading