Use binary search for histogram buckets #316

simpl1g · 2024-11-02T22:53:24Z

I noticed that we can use binary search as we always have sorted buckets array to improve performance

ruby 3.3.5 (2024-09-03 revision ef084cc8f4) +YJIT [arm64-darwin23]
Warming up --------------------------------------
 default_buckets old   117.753k i/100ms
default_buckets bsearch
                       226.654k i/100ms
   large_buckets old    10.103k i/100ms
large_buckets bsearch
                       130.281k i/100ms
Calculating -------------------------------------
 default_buckets old      1.396M (± 2.8%) i/s  (716.41 ns/i) -      7.065M in   5.065606s
default_buckets bsearch
                          2.414M (± 5.6%) i/s  (414.22 ns/i) -     12.239M in   5.088910s
   large_buckets old     99.015k (± 4.8%) i/s   (10.10 μs/i) -    495.047k in   5.013659s
large_buckets bsearch
                          1.486M (± 3.2%) i/s  (672.98 ns/i) -      7.426M in   5.003273s

With default buckets in gives 1.5x improvement. With buckets array with 286 elements - 15x improvement

BUCKETS = [
  0.00001, 0.000015, 0.00002, 0.000025, 0.00003, 0.000035, 0.00004, 0.000045, 0.00005, 0.000055, 0.00006, 0.000065, 0.00007, 0.000075, 0.00008, 0.000085,
  0.00009, 0.000095, 0.0001, 0.000101, 0.000102, 0.000103, 0.000104, 0.000105, 0.000106, 0.000107, 0.000108, 0.000109, 0.00011, 0.000111, 0.000112, 0.000113,
  0.000114, 0.000115, 0.000116, 0.000117, 0.000118, 0.000119, 0.00012, 0.000121, 0.000122, 0.000123, 0.000124, 0.000125, 0.000126, 0.000127, 0.000128,
  0.000129, 0.00013, 0.000131, 0.000132, 0.000133, 0.000134, 0.000135, 0.000136, 0.000137, 0.000138, 0.000139, 0.00014, 0.000141, 0.000142, 0.000143, 0.000144,
  0.000145, 0.000146, 0.000147, 0.000148, 0.000149, 0.00015, 0.000151, 0.000152, 0.000153, 0.000154, 0.000155, 0.000156, 0.000157, 0.000158, 0.000159, 0.00016,
  0.000161, 0.000162, 0.000163, 0.000164, 0.000165, 0.000166, 0.000167, 0.000168, 0.000169, 0.00017, 0.000171, 0.000172, 0.000173, 0.000174, 0.000175,
  0.000176, 0.000177, 0.000178, 0.000179, 0.00018, 0.000181, 0.000182, 0.000183, 0.000184, 0.000185, 0.000186, 0.000187, 0.000188, 0.000189, 0.00019, 0.000191,
  0.000192, 0.000193, 0.000194, 0.000195, 0.000196, 0.000197, 0.000198, 0.000199, 0.0002, 0.00021, 0.00022, 0.00023, 0.00024, 0.00025, 0.00026,
  0.00027, 0.00028, 0.00029, 0.0003, 0.00031, 0.00032, 0.00033, 0.00034, 0.00035, 0.00036, 0.00037, 0.00038, 0.00039, 0.0004, 0.00041, 0.00042,
  0.00043, 0.00044, 0.00045, 0.00046, 0.00047, 0.00048, 0.00049, 0.0005, 0.00051, 0.00052, 0.00053, 0.00054, 0.00055, 0.00056, 0.00057, 0.00058,
  0.00059, 0.0006, 0.00061, 0.00062, 0.00063, 0.00064, 0.00065, 0.00066, 0.00067, 0.00068, 0.00069, 0.0007, 0.00071, 0.00072, 0.00073, 0.00074,
  0.00075, 0.00076, 0.00077, 0.00078, 0.00079, 0.0008, 0.00081, 0.00082, 0.00083, 0.00084, 0.00085, 0.00086, 0.00087, 0.00088, 0.00089, 0.0009,
  0.00091, 0.00092, 0.00093, 0.00094, 0.00095, 0.00096, 0.00097, 0.00098, 0.00099, 0.001, 0.0015, 0.002, 0.0025, 0.003, 0.0035, 0.004, 0.0045, 0.005,
  0.0055, 0.006, 0.0065, 0.007, 0.0075, 0.008, 0.0085, 0.009, 0.0095, 0.01, 0.015, 0.02, 0.025, 0.03, 0.035, 0.04, 0.045, 0.05, 0.055, 0.06, 0.065, 0.07,
  0.075, 0.08, 0.085, 0.09, 0.095, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.5, 2.0, 2.5,
  3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0, 13.5, 14.0, 14.5, 15.0, 15.5, 16.0, 16.5, 17.0, 17.5
].freeze

default_buckets = Prometheus::Client::Histogram.new(:default_buckets, docstring: 'Default buckets')
large_buckets = Prometheus::Client::Histogram.new(
  :large_buckets,
  docstring: 'Large buckets',
  buckets: BUCKETS
)
Benchmark.ips do |x|
  x.config(time: 5, warmup: 1)

  x.report('default_buckets old') { default_buckets.observe(1) }
  x.report('default_buckets bsearch') { default_buckets.observe_bsearch(1) }

  x.report('large_buckets old') { large_buckets.observe(1) }
  x.report('large_buckets bsearch') { large_buckets.observe_bsearch(1) }

  x.compare!
end

dmagliola · 2024-11-03T09:46:30Z

Could we have the change by itself, without the reformatting of the entire file?

Signed-off-by: Konstantin Ilchenko <[email protected]>

simpl1g · 2024-11-03T11:50:36Z

Could we have the change by itself, without the reformatting of the entire file?

@dmagliola Sorry, fixed, will also try to fix excessive allocations in separate PR

dmagliola · 2024-11-04T06:21:13Z

RE this PR: I'm planning to do a little performance experimentation locally but I'm assuming this will get merged.

RE excessive allocations... Is that related to this change? Or something else?

simpl1g · 2024-11-04T11:24:32Z

RE excessive allocations... Is that related to this change? Or something else?

@dmagliola this change is simple and it is not connected and will give a lot of boost for observe

my concerns that

a lot of places have bucket.to_s, but we can preallocate it on init
plain strings used like "+Inf", but frozen_string_literals is not set, so it allocates new object on each observe call
things like buckets + ["+Inf", "sum"] allocate new arrays (it is not in hot pass of observe, so not critical)

dmagliola · 2024-11-04T20:10:41Z

Ok, so, for this change: This seems like a sensible thing to do, my only concern was whether there could be a regression under some particular circumstance.

Doing binary search obviously works great with large numbers of buckets, and it works best the higher the observed value is (for obvious reasons). But could there be a situation, particularly with low numbers of buckets or observed numbers where it was slower?

I basically couldn't make that happen, almost. I made a benchmark script similar to yours, but it was using different numbers of buckets, and observing different numbers. The only situation in which I could make find be faster than bsearch was with large numbers of buckets, and observing literally zero. Observing any tiny number, bsearch was still faster or the same.

Given this, I think this is safe to merge. I'll give @Sinjo a change to object before I do, but I'm basically happy that this change is unequivocally good.

dmagliola · 2024-11-04T20:15:18Z

a lot of places have bucket.to_s, but we can preallocate it on init

plain strings used like "+Inf", but frozen_string_literals is not set, so it allocates new object on each observe call

things like buckets + ["+Inf", "sum"] allocate new arrays (it is not in hot pass of observe, so not critical)

These sounds good. I'm not 100% sure about the first one, but looking forward to your PR :)

Can you open a new one with these changes once you have them?

Some comments on these specifically:

a lot of places have bucket.to_s, but we can preallocate it on init

We're mostly turning the float that defines the upper bound of the bucket into a string. we can't store the strings already, though, because we need the floats to find the right bucket. But maybe i'm misunderstanding what you mean, or missing some obvious way to do this.

things like buckets + ["+Inf", "sum"] allocate new arrays (it is not in hot pass of observe, so not critical)

We have 2 hot paths, actually... observe is the one we have (over)optimized, but we (I, actually) neglected export, which causes problems for some of our potential users. If your change will make exporting faster (or reduce allocations), that would be extremely welcome too, not just improving observe. The key method here is Histogram#values.

simpl1g · 2024-11-04T20:43:22Z

But maybe i'm misunderstanding what you mean

I though about hashes, something like this

def initialize
  @h = buckets.map { |b| [b, b.to_s] }.to_h
...
def observe
  bucket = buckets.bsearch { |upper_limit| upper_limit >= value }
  str = @h[bucket] || '+Inf'

I don't see big performance improvements, but it reduces allocations
I can also look at Histogram#values to see what is happening there. It was not my priority

Why I decided to look into a code. I tried to build simple rack app to illustrate that Ruby can be fast enough compared to Rails and nodejs. There was video on youtube that compared Rails and node https://www.youtube.com/watch?v=Qp9SOOtgmS4 and obviously Rails was very slow. So I added two PR's with improvements antonputra/tutorials#330
antonputra/tutorials#335

max RPS that I was able to achieve was 105k/s. And just adding single line use Prometheus::Middleware::Collector reduced performance to 75k/s, so it is 30% penalty, quite a big. After applying bsearch patch I have around 79k/s RPS. I believe it is too much of overhead and I'm trying to understand where it came from

dmagliola · 2024-11-05T14:11:19Z

Yeah, that makes sense. Let's get this one merged as is, and let's discuss those other changes in a new PR when you have time to open it.

Thank you for the contributions!

simpl1g force-pushed the improve-histogram-performance branch from 3f052fd to 981f287 Compare November 2, 2024 22:55

Use binary search for histogram buckets

100f46c

Signed-off-by: Konstantin Ilchenko <[email protected]>

simpl1g force-pushed the improve-histogram-performance branch from 981f287 to 100f46c Compare November 3, 2024 11:48

simpl1g mentioned this pull request Nov 3, 2024

220: Improve Rails perf 5-10x antonputra/tutorials#335

Merged

dmagliola approved these changes Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use binary search for histogram buckets #316

Use binary search for histogram buckets #316

simpl1g commented Nov 2, 2024

dmagliola commented Nov 3, 2024

simpl1g commented Nov 3, 2024

dmagliola commented Nov 4, 2024

simpl1g commented Nov 4, 2024

dmagliola commented Nov 4, 2024

dmagliola commented Nov 4, 2024

simpl1g commented Nov 4, 2024

dmagliola commented Nov 5, 2024

Use binary search for histogram buckets #316

Are you sure you want to change the base?

Use binary search for histogram buckets #316

Conversation

simpl1g commented Nov 2, 2024

dmagliola commented Nov 3, 2024

simpl1g commented Nov 3, 2024

dmagliola commented Nov 4, 2024

simpl1g commented Nov 4, 2024

dmagliola commented Nov 4, 2024

dmagliola commented Nov 4, 2024

simpl1g commented Nov 4, 2024

dmagliola commented Nov 5, 2024