-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Multi-map implementation #591
base: master
Are you sure you want to change the base?
Conversation
I like this idea because the current definition of the tables kind of forbid this kind of invariant, and thus don't let you implement them yourself. Will try and comment on the code specifically when I have the time. |
There are several multi-maps on crates.io, the most downloaded being https://crates.io/crates/multimap. Maybe it would be better to contribute there? They are currently wrapping |
I think that allowing a hash table to contain multiple keys directly would be nice since it helps a lot with cache locality, and generally, multimaps only contain a couple of duplicated keys. That said, you're right that the main benchmark should probably be |
Yeap, that was the idea. Having whole |
One concern I have with multi-map implementations where multiple identical entries are inserted directly into the map is that you're intentionally introducing hash collisions, which open-addressing hash tables may not handle very well. This is not a concern for |
Is it collision when the same key has the same hash? This implementation forces continued probing on insert if key is already occupied and not empty slots in the same group are found. When you insert many entries with the same key, there won't be good distribution, they'll all fall into same group and then into the next one. I think we can at least benchmark it to see, but I suppose this would work better for collections with not too many values per key than |
Essentially the time complexity for hash table operations ceases to be amortized O(1) and becomes O(k) where k is the number of duplicates of a key. |
Which operations? Insert - kinda. |
You could use
Another option compared to
Given that there are multiple reasonable options, I think it makes more sense to publish it in a separate crate, rather than "blessing" one choice in |
So, I should clarify my specific preferences here: I think that it would be useful to allow tables with duplicate keys in I think that it's fair to say that offering an entire dedicated |
What's stopping that? You can already use |
Documentation saying bad things would happen if insert_unique is used to insert a duplicate. |
Yeah, the docs currently don't say what happens when duplicates are inserted, and it would be nice to find a way to allow that in a reasonable way and maybe even test for it. I wouldn't feel comfortable saying that duplicates are allowed until both the docs are updated and tests are added to verify it works as intended. |
The only relevant doc I found is at the top of
Maybe we should soften the bit about "shoot yourself in the foot" since it "will still function correctly." |
Right, it was |
The footgun here specifically refers to the time complexity of operations changing from O(1) to O(k) if you have k duplicates. Perhaps this comment could be clarified. Multiple keys are otherwise perfectly fine with
Actually these are still O(k) because duplicate keys will displace other elements out of their preferred position in the hash table. Consider an extreme example of a map filled with 1M keys "A" and 1M keys "B" (in that order): you would expect a "B" to be displaced by 0.5-1M from its preferred spot, requiring that many probes before being found. |
Adds
HashMultiMap
that allows storing multiple values with the same key.This is a draft to get some feedback on the idea.
Notable API differences from
HashMap
are following:fn insert
always add new valueget*
methods replaced with methods to return iterator over values with keys equal to queried one.TODO: