-
Notifications
You must be signed in to change notification settings - Fork 347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CountMinSketch[K] assumes working equals method on K. #497
Comments
A possible solution here is to add an PS: @non this is why I think any Hashing typeclass should have equivalence: you often need it, and it seems impossible to state any law on the typeclass without it (well, you could make probabilisitic statements about collisions). |
Also, the SparseCMS does not seem fixable at all with We could fix this by making the table have the key which is the hash, and then check collisions using the equivalence in the CMSHasher, or we could remove the sparse business. |
We internally use structures often where having a graceful degradation from exact down to approximate is nice. (TSAR uses these heavily), so it sort of would be nice to keep it if we could. Requiring the Hasher type class to provide the equality also i think would be most preferable if not too painful. (We'd need to be careful of course about using any built in maps or other scala stuff... which is annoying) |
This is old, but, here is the ugly wrapper class I created in scalding to make arrays have efficient orderings and hash codes: The problem is you have to make sure to wrap your arrays in these wrappers. |
that may be worth adding. The core issue is unsolved: we are using equality without communicating in the types that we are doing so. We need something like https://github.com/non/algebra/blob/master/core/src/main/scala/algebra/Eq.scala#L12 |
When/if should we take the algebra dep into algebird I wonder, that was the original intent right? Or should we start going for more Equiv typeclass usage and less =='s anywhere? |
Good question. I hoped soon, then this cats/algebra conflict came up: The problem with Equiv is that there is a default for all types falling back to |
Yeah unless we get away from use of == then users (and library implementors) just have to take care to wrap types when they have a bad ==, no way around that really. Using an Equiv makes sense to me, maybe the migration path could be to require an Equiv wherever we use ==, and if one isn't provided, fall back to ==, then deprecate the fall back path... etc |
Scala provides a fallback Equiv (==) in the companion object already so On Thu, Jan 21, 2016 at 3:38 PM, Alex Levenson [email protected]
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin |
Ok, well, what I said but replace scala's Equiv with something similar that has no fallback on == then? |
@isnotinvain we can definitely pick this up now that we have the algebra dep! |
For small count-min sketches, you create CMSItem:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala#L467
But to get the frequency of K we use
==
:https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala#L479
if you have
CMSItem[Array[Byte]]
as happens in scalding for sketches of tiny key spaces (which might come up in unit tests or toy examples), this means we always reurn a frequency of 0 forCMSItem[Array[Byte]].frequency
unless you happen to pass the same exact array in.trying to bite my tongue about the design flaw in Java that
equals
iseq
for Arrays.The text was updated successfully, but these errors were encountered: