-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Tombstones #12
Added Tombstones #12
Conversation
.retain() now returns an iterator with the removed items closes indexmap-rs#11 Also cleaned up some match statements.
Thanks, that's interesting. It seems like a direct translation of the retain method into an iterator. You're kind of putting me on the spot here, I have to review the design of the retain iterator and if I do it with my usual critical eye, there are some imperfections here:
|
Well, that was why I said it seemed simple 😝 . The only alternative I can think of to Run on Drop is how I'm certainly open to: pub fn retain<F, G, R>(&mut self, keep: F, retain: G) -> R
where F: FnMut(&mut K, &mut V) -> bool,
G: FnOnce(Retain<K, V, S, F>) -> R Which would complete the operation before returning. I suspect |
There's also the alternative of designing the operation to be entirely lazy, i.e. don't have any implicit action on drop, don't remove more elements than the iterator is driven into doing. |
That's somewhat unintuitive for users who don't want to iterate (i.e. people who would like to use the current function): m.retain(|k, _| k < 5); // does nothing
m.retain(|k, _| k < 5).count(); // actually works Is there a way to reduce that pain? Like opting in to the |
Some ideas, just in that case:
|
I think I'm now thoroughly convinced you cannot have all of these:
There maybe some exceedingly clever way of accomplishing this, but if so, I'm not seeing it. |
hehe, don't let me torture you for ordermap's purposes! |
But if you had to pick two... There is one alternative I've thought of, but it require 1 bit per Basically drop the keys and values as soon as is requested, but keep the bucket around in it's initial position with an enum variant that says Then, whenever you need the invariant to be restored, you check the // Fix pos values
let mut dropped = 0;
for e in self.entries.iter() {
if e.is dropped() {
dropped = dropped + 1;
} else {
let pos = ...; // find the correct index
pos.sub_index(dropped); // fix for after we drop
}
}
// Drop pos values
for p in self.backwards_loop_through_runs_in_indices {
if self.entries[p.pos()].is_dropped() {
// remove p and shift
}
}
// Drop entries
self.entries.retain(|e| !e.is_dropped()) With this setup, you could also have a |
Name idea: |
Functioning tombstones and some associated work. Also some other things like reserve and removing S from Entry. Ironically removes Retain iterators. Attempts to minimize unwrap usage. Also pointlessly moves a bunch of methods around. Sorry about that.
Well, this no longer adds a |
Hah, crazy but interesting. Implementing tombstones is a big step. I'll have to look at this carefully. What are the benchmarks like? |
Don't need to do anything if there a no tombstones
I've only changed
I'm going to investigate using the MSB of the hash in place of |
Contains unsafe code, but it should all be contained in unsafe.rs and guarded by feature flags.
Found some bugs and fixed them. Also added some unsafe options under test feature flags. They don't really seem to provide any perf benefit, so I'll probably end up scrapping them. The CI break seems to be from an SSL error for crates.io trying to pull the tagged union packaged I added as a dependency. Not sure what's up with that but with how things are looking I'll probably end up removing the dependency. |
Improved find_existing_entry_mut: No longer requires entries (everything ahead of the entry can just be shifted). Switched to first_tombstone instead of last. Maximized usage of this function. Fixed remove_index_tombstones: fixed backtracking bug which tanked perf. Rewored .entry: merged with insert to reduce code duplication. Reworked probe loop to seperate tombstone code (makes it easier to reason about) Switched to first_tombstone instead of last.
Derp. We don't need a linked list during tombstone removal, tombstones will always be consecutive, thanks to the robin-hood invariant. This also means we can improve tombstone compression in entry_phase_1 and find_exisint_entry_mut. This should provide a minor perf improvement overall, and a more substantial improvement for remove_index_tombstones
So, found a couple more bugs, but while I fix those, since I'm already touching so much of the code, how would you feel about the following changes:
My justification for the former is, it seems like a bad idea to have duplicate data that could get out of sync. The mask is already implicitly stored in My justification for the later is it reads better than the macro (especially when debugging as line numbers get collapsed to the first line of the macro) and seems to have very little perf cost. Also I somewhat already have to use this approach in some of my tombstone loops. |
I'm sorry I haven't sat down and looked at this yet. I would hold off on both of those changes. |
Remove all the unsafe testing, and unneeded functions. A couple bug fixes. Change swap_remove_found to use tombstones.
No problem, I've had to do quite a bit of cleanup before this PR would actually be usable. I've removed all my unsafe testing (it ultimately worked out to a maybe a 5% gain) and switched everything over to tombstones which can benefit from them. Hopefully I've removed all the bugs in my tombstone code 🤞. |
I'm sorry that I haven't been able to review this. I think tombstones are going to be useful, and maybe we should bench it against linked hash map instead for a more apples to apples comparison. Unsure if we should fork the crate for this, I almost think so if we can't bring the overhead down. Improving retain just to have lookup and insert regress really is throwing the baby out with the bath water for a general purpose hash table. |
Ok, I'm clearly not fit to help this PR as a maintainer. I would recommend forking the crate and giving the map a different name, because with tombstones you get a different hash table with other nice properties, while removing some of the nice properties of this OrderMap. (Performance and compactly indexed set). |
@Popog Hey! What do you think about publishing ordermap with full insertion order (using tombstones) as a separate crate? |
Got a good idea for a name? (I'm the worst at naming things) |
A. What if current ordermap is renamed to donate its name? B. ordermap stays and the new type with matching crate is "InsertionOrderMap"? |
compactmap is taken: https://crates.io/crates/compactmap |
I think this PR would be simpler if it didn't use index tombstones, only entry tombstones. We don't need to keep any index order |
Yeah, the index tombstones was just an attempt to amoratize other operations. I should have time this weekend to go back a couple commits to just entry tombstones and clean that up. What about instead of a new crate, just adding it as a another type to this crate? |
I'd prefer to split the crates if we are not at the point where implementation can be shared. I think all the core probe loops etc will look different, so there will not be so much actual overlap? There's also the reason that if it's in this crate I'd feel responsible for it, and I don't have time to be responsible for another one unfortunately. |
I believe this PR is obsolete since renaming to I think it's been long enough that we could donate / give up the |
.retain() now returns an iterator with the removed items
closes #11
Also cleaned up some match statements.