Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-write median(Binomial) to improve speed #1928

Merged

Conversation

marcusps
Copy link
Contributor

@marcusps marcusps commented Dec 15, 2024

Fix #1926 by following suggestions therein to reduce avg number of cdf(Binomial) evaluations.

Follow suggestion from JuliaStats#1926 to reduce `cdf(Binomial)` evaluations.
@codecov-commenter
Copy link

codecov-commenter commented Dec 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.01%. Comparing base (790411a) to head (a422fdb).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1928      +/-   ##
==========================================
+ Coverage   85.99%   86.01%   +0.01%     
==========================================
  Files         144      144              
  Lines        8684     8696      +12     
==========================================
+ Hits         7468     7480      +12     
  Misses       1216     1216              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@andreasnoack andreasnoack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be useful to link to the paper in a comment. It would make it easier to reason about the implementation for a future reader of the source.

src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved
src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved
src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved
src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved
src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved
src/univariate/discrete/binomial.jl Outdated Show resolved Hide resolved
@marcusps
Copy link
Contributor Author

Following the suggestions, given

rv = Binomial(25,0.3)
rv2 = Binomial(25,0.756)

the previous version yielded

julia> @benchmark median(rv)
BenchmarkTools.Trial: 10000 samples with 33 evaluations.
 Range (min  max):  924.576 ns    9.896 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     929.455 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   980.001 ns ± 151.139 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅               ▁     ▁   ▄   ▃                              ▁
  ██▅▇▇▇▇▅▇▅▃▇▄▅▇▁▃█▃██▁▁█▇▅███▃▆█▇▅▃█▆▃▄▇▄▁▁▄█▅▁▃▃█▅▄▁▅▇▆▄▁▁▃▇ █
  925 ns        Histogram: log(frequency) by time       1.49 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark median(rv2)
BenchmarkTools.Trial: 10000 samples with 26 evaluations.
 Range (min  max):  941.346 ns   13.187 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     945.231 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.007 μs ± 187.974 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █               ▁   ▂  ▂   ▄   ▃                              ▁
  █▇█▅▅█▅▆▇▄█▅▁▇▅▃██▁▃█▃▁█▅▁▃█▃▁▄█▅▆▇█▄▅▅█▇▄▃▁█▄▁▃▃█▃▁▁▁▆█▁▄▄▁█ █
  941 ns        Histogram: log(frequency) by time       1.51 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

and the version version yields

julia> @benchmark median(rv)
BenchmarkTools.Trial: 10000 samples with 469 evaluations.
 Range (min  max):  225.928 ns  557.132 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     228.358 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   235.036 ns ±  21.306 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▂▆▇▁                          ▁▁   ▁                        ▂
  ████████▅▅▅▇▇▆▇▅█▇▄▄█▆▄▅▆▅▄▄▅▅▄▄██▆▅▄█▇▅▅▄▇▅▃▄▂▄▄▃▄▄▄▆▄▃▄▄▅▅▄ █
  226 ns        Histogram: log(frequency) by time        340 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark median(rv2)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min  max):  17.037 ns  52.826 ns  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     18.434 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.582 ns ±  2.704 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅██▇▇▅▆▆▅▃     ▁  ▁▂  ▁▁▃▂▄▄▄▃▃▂▃▂ ▁▁                     ▂
  ▄▇██████████▇█▇▄███▇██▇▇███████████████▇▆▇▇▇▆▇▇▇▄▆▇▃▇▆▅█▂▇▆ █
  17 ns        Histogram: log(frequency) by time        29 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I will check how much the minmax suggestion helps.

@marcusps
Copy link
Contributor Author

Here is a more thorough comparison. To get full coverage across the branches, I extended the benchmarks to

rv = Binomial(25,0.3)
rv2 = Binomial(25,0.756)
rv3 = Binomial(25,1//2)
rv4 = Binomial(25, 3//5)

and then ran bitj = @benchmark median(rvj) where i=1 represents the current (buggy) implementation of the median, i=2 is the current (merged) implementation that fixes the bug, and i=3 incorporates something akin to minmax (but still using only min), which looks like this:

function median(dist::Binomial)
    bound = min(dist.p, 1-dist.p) # from http://dx.doi.org/10.1111/j.1467-9574.1980.tb00681.x
    dist_mean = mean(dist)
    
    floor_mean = floor(Int, dist_mean)
    difference = dist_mean - floor_mean
    
    if difference < bound
        floor_mean
    elseif  difference > 1 - bound
        floor_mean + 1
    elseif cdf(dist, floor_mean) >= 0.5
        floor_mean
    else
        floor_mean + 1
    end
end

The results are

julia> for (a,b) in zip(b2,b3)
         println(BenchmarkTools.judge(median(a),median(b)))
       end
TrialJudgement(+0.40% => invariant)
TrialJudgement(-9.69% => improvement)
TrialJudgement(-0.04% => invariant)
TrialJudgement(-3.12% => invariant)

julia> for (a,b) in zip(b2,b3)
         println(BenchmarkTools.judge(var(a),var(b)))
       end
TrialJudgement(+12.14% => regression)
TrialJudgement(-49.32% => improvement)
TrialJudgement(-22.95% => improvement)
TrialJudgement(-12.07% => improvement)

which means the simplification is a definite improvement, both for the median and variance.

The new bug free implementations are still much slower than the buggy implementation, but that feels like an unfair comparison.

julia> for (a,b) in zip(b1,b2)
         println(BenchmarkTools.judge(median(a),median(b)))
       end
TrialJudgement(+308.74% => regression)
TrialJudgement(+5373.54% => regression)
TrialJudgement(+343.01% => regression)
TrialJudgement(+3152.94% => regression)

julia> for (a,b) in zip(b1,b3)
         println(BenchmarkTools.judge(median(a),median(b)))
       end
TrialJudgement(+310.37% => regression)
TrialJudgement(+4843.23% => regression)
TrialJudgement(+342.83% => regression)
TrialJudgement(+3051.34% => regression)

@marcusps marcusps changed the title Fix #1926 Speed up median(Binomial) Dec 16, 2024
@marcusps marcusps changed the title Speed up median(Binomial) Re-write median(Binomial) to improve speed Dec 16, 2024
@devmotion
Copy link
Member

I see the following difference to the master branch:

This PR

julia> using Distributions, Chairmarks

julia> @be median($(Binomial(44, 0.1)))
Benchmark: 3317 samples with 186 evaluations
 min    135.301 ns
 median 152.554 ns
 mean   153.703 ns
 max    250.892 ns

julia> @be median($(Binomial(40, 0.1)))
Benchmark: 4932 samples with 4265 evaluations
 min    3.976 ns
 median 4.298 ns
 mean   4.556 ns
 max    9.066 ns

julia> @be median($(Binomial(25, 0.3)))
Benchmark: 3891 samples with 147 evaluations
 min    137.755 ns
 median 157.592 ns
 mean   163.907 ns
 max    284.014 ns

julia> @be median($(Binomial(25, 0.756)))
Benchmark: 4933 samples with 3917 evaluations
 min    4.266 ns
 median 4.606 ns
 mean   4.900 ns
 max    21.966 ns

julia> @be median($(Binomial(25, 0.5)))
Benchmark: 4809 samples with 4545 evaluations
 min    3.988 ns
 median 4.171 ns
 mean   4.351 ns
 max    9.332 ns

julia> @be median($(Binomial(25, 0.6)))
Benchmark: 4921 samples with 4227 evaluations
 min    3.982 ns
 median 4.298 ns
 mean   4.538 ns
 max    9.148 ns

Master branch

julia> using Distributions, Chairmarks

julia> @be median($(Binomial(44, 0.1)))
Benchmark: 3387 samples with 125 evaluations
 min    214.664 ns
 median 215.672 ns
 mean   222.091 ns
 max    366.672 ns

julia> @be median($(Binomial(40, 0.1)))
Benchmark: 4111 samples with 86 evaluations
 min    239.826 ns
 median 253.395 ns
 mean   265.475 ns
 max    431.686 ns

julia> @be median($(Binomial(25, 0.3)))
Benchmark: 3938 samples with 101 evaluations
 min    215.347 ns
 median 225.663 ns
 mean   234.931 ns
 max    417.901 ns

julia> @be median($(Binomial(25, 0.756)))
Benchmark: 3951 samples with 94 evaluations
 min    225.617 ns
 median 242.904 ns
 mean   250.102 ns
 max    419.777 ns

julia> @be median($(Binomial(25, 0.5)))
Benchmark: 3920 samples with 100 evaluations
 min    216.250 ns
 median 227.080 ns
 mean   237.701 ns
 max    419.170 ns

julia> @be median($(Binomial(25, 0.6)))
Benchmark: 3643 samples with 105 evaluations
 min    217.057 ns
 median 233.733 ns
 mean   245.306 ns
 max    732.143 ns

@devmotion devmotion merged commit 445c2fe into JuliaStats:master Dec 17, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up median(Binomial)
4 participants