-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect results for binary search on large arrays due to miscomputation of midpoint #261
Comments
Fun! Does this manifest in practice? The largest array I've been able to generate has 2,145,386,496 Uint8 elements, and it seems that d3 bisects it correctly. |
In Safari the following code freezes the tab, which I think is compatible with the idea described here.
https://github.com/d3/d3-array/blob/main/src/bisector.js#L34-L44 The comparison will always be zero, so |
Thanks! My tests were with Chrome, which throws a "RangeError: Array buffer
allocation failed".
|
Yes it makes sense. Since there are log2(n) such divisions, I don't think
it's expensive to change >>> to / 2 — the question is more, do we want to
support such huge data arrays even though most browsers can't yet
handle them (apparently you have found a use case!).
If we do, are there any other places in the code where we have this
assumption that indices are <= 2^30?
If we don't, should we write this limitation in the documentation (support
for arrays up to 2^30 elements…)?
|
All good questions. It looks like this isn't a common issue right now, but given the direction things are heading I wouldn't be surprised if Chrome lifts the limit at some point in the next few years. Binary search has a rich history of bugs like this, which weren't discovered until many years later when data sizes increased. :) I took a quick look and d3 uses the FWIW I noticed this while experimenting with techniques for interactively zooming and panning ~billion-point datasets on the client side, which is a fairly niche use case right now. |
Actually, I think we could just use const mid = lo + (hi - lo) >>> 1; as a strict improvement, which will work correctly for a larger range of values. I was also reminded today of the existence of const mid = lo + Math.trunc((hi - lo) / 2); |
Is there any reason not to use the simpler form? const mid = Math.trunc((lo + hi) / 2); I mean sure, you could run out of integer precision when adding |
That's a good point. I don't see any issues with the simpler form.
😄 |
I think I've spotted a bug in d3's binary search implementations that arises due to the use of unsigned bit shifts for the midpoint computation, which is a good approach when the numbers are actually unsigned integers, but not when they're floats:
https://observablehq.com/d/229d8bcf448308b9
If I've understood the problem correctly, the fix is simple, replacing
in bisector.left and bisector.right with
The text was updated successfully, but these errors were encountered: