-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use binary search for histogram buckets #316
base: main
Are you sure you want to change the base?
Conversation
3f052fd
to
981f287
Compare
Could we have the change by itself, without the reformatting of the entire file? |
Signed-off-by: Konstantin Ilchenko <[email protected]>
981f287
to
100f46c
Compare
@dmagliola Sorry, fixed, will also try to fix excessive allocations in separate PR |
RE this PR: I'm planning to do a little performance experimentation locally but I'm assuming this will get merged. RE excessive allocations... Is that related to this change? Or something else? |
@dmagliola this change is simple and it is not connected and will give a lot of boost for my concerns that
|
Ok, so, for this change: This seems like a sensible thing to do, my only concern was whether there could be a regression under some particular circumstance. Doing binary search obviously works great with large numbers of buckets, and it works best the higher the observed value is (for obvious reasons). But could there be a situation, particularly with low numbers of buckets or observed numbers where it was slower? I basically couldn't make that happen, almost. I made a benchmark script similar to yours, but it was using different numbers of buckets, and observing different numbers. The only situation in which I could make Given this, I think this is safe to merge. I'll give @Sinjo a change to object before I do, but I'm basically happy that this change is unequivocally good. |
These sounds good. I'm not 100% sure about the first one, but looking forward to your PR :) Can you open a new one with these changes once you have them? Some comments on these specifically:
We're mostly turning the float that defines the upper bound of the bucket into a string. we can't store the strings already, though, because we need the floats to find the right bucket. But maybe i'm misunderstanding what you mean, or missing some obvious way to do this.
We have 2 hot paths, actually... |
I though about hashes, something like this def initialize
@h = buckets.map { |b| [b, b.to_s] }.to_h
...
def observe
bucket = buckets.bsearch { |upper_limit| upper_limit >= value }
str = @h[bucket] || '+Inf' I don't see big performance improvements, but it reduces allocations Why I decided to look into a code. I tried to build simple rack app to illustrate that Ruby can be fast enough compared to Rails and nodejs. There was video on youtube that compared Rails and node https://www.youtube.com/watch?v=Qp9SOOtgmS4 and obviously Rails was very slow. So I added two PR's with improvements antonputra/tutorials#330 max RPS that I was able to achieve was 105k/s. And just adding single line |
Yeah, that makes sense. Let's get this one merged as is, and let's discuss those other changes in a new PR when you have time to open it. Thank you for the contributions! |
I noticed that we can use binary search as we always have sorted buckets array to improve performance
With default buckets in gives 1.5x improvement. With buckets array with 286 elements - 15x improvement