Added Tombstones #12

Popog · 2016-11-15T14:01:00Z

.retain() now returns an iterator with the removed items
closes #11
Also cleaned up some match statements.

.retain() now returns an iterator with the removed items closes indexmap-rs#11 Also cleaned up some match statements.

bluss · 2016-11-15T14:31:02Z

Thanks, that's interesting. It seems like a direct translation of the retain method into an iterator.

You're kind of putting me on the spot here, I have to review the design of the retain iterator and if I do it with my usual critical eye, there are some imperfections here:

OrderMap gets an iterator that does not use the regular order of the elements
Run on Drop is a questionable design pattern in itself. (Open question: Why should we use this pattern if we are making a new API?)
- forget problematics show how nonintuitive it can be to tie behaviour to a temporary value
- the operation is lazy as an iterator but eager on drop.
I believe this crate can implement an in-order "retain" iterator with the same algorithmic complexity

Popog · 2016-11-15T14:57:50Z

Well, that was why I said it seemed simple 😝 .

The only alternative I can think of to Run on Drop is how thread::scoped solved its issues.

I'm certainly open to:

pub fn retain<F, G, R>(&mut self, keep: F, retain: G) -> R
    where F: FnMut(&mut K, &mut V) -> bool,
    G: FnOnce(Retain<K, V, S, F>) -> R

Which would complete the operation before returning. I suspect forget would prevent, or at least significantly slow in-order traversal. So this would avoid that, and still allow users to do simple loops or .collect() it for storage.

bluss · 2016-11-15T15:01:39Z

There's also the alternative of designing the operation to be entirely lazy, i.e. don't have any implicit action on drop, don't remove more elements than the iterator is driven into doing.

Popog · 2016-11-15T15:08:29Z

That's somewhat unintuitive for users who don't want to iterate (i.e. people who would like to use the current function):

m.retain(|k, _| k < 5); // does nothing
m.retain(|k, _| k < 5).count(); // actually works

Is there a way to reduce that pain? Like opting in to the hey, you should use this warning that unusedResults produce.

bluss · 2016-11-15T15:16:07Z

Some ideas, just in that case:

Rename the method so that expectations change (can't keep the "retain" name since the expectations from existing APIs is wrong).
use #[must_use] so that it makes a warning when it does nothing (just like iterator adaptors like .map())
add an explicit method for draining the iterator / running it til the end.

Popog · 2016-11-15T20:23:47Z

I think I'm now thoroughly convinced you cannot have all of these:

O(n)
A forgettable return value (no Drop or scoped style impl).
No change to iteration order.

There maybe some exceedingly clever way of accomplishing this, but if so, I'm not seeing it.

bluss · 2016-11-15T20:43:38Z

hehe, don't let me torture you for ordermap's purposes!

Popog · 2016-11-15T22:35:01Z

But if you had to pick two...

There is one alternative I've thought of, but it require 1 bit per Bucket and unfortunately, since this library avoids unsafe, we can't just take it from the HashValue or Pos like std does. Also we probably want 1 bit on the OrderMap itself for efficiency.

Basically drop the keys and values as soon as is requested, but keep the bucket around in it's initial position with an enum variant that says NeedsCleanup. Also don't touch indices at all at this point. Also flag the OrderMap as needs_cleanup for easy checking.

Then, whenever you need the invariant to be restored, you check the needs_cleanup flag do the following 3-step process if needed:

// Fix pos values
let mut dropped = 0;
for e in self.entries.iter() {
    if e.is dropped() {
        dropped = dropped + 1;
    } else {
        let pos = ...; // find the correct index
       pos.sub_index(dropped); // fix for after we drop
    }
}

// Drop pos values
for p in  self.backwards_loop_through_runs_in_indices {
    if self.entries[p.pos()].is_dropped() {
        // remove p and shift
    }
}

// Drop entries
self.entries.retain(|e| !e.is_dropped())

With this setup, you could also have a remove_index function, which retain and similar functions would just call in a loop.

Popog · 2016-11-17T10:51:16Z

Rename the method so that expectations change (can't keep the "retain" name since the expectations from existing APIs is wrong).

Name idea: filter_cursor

Functioning tombstones and some associated work. Also some other things like reserve and removing S from Entry. Ironically removes Retain iterators. Attempts to minimize unwrap usage. Also pointlessly moves a bunch of methods around. Sorry about that.

Popog · 2016-11-25T11:25:01Z

Well, this no longer adds a Retain iterator, but it does add tombstones, which will make implementing filter_cursor trivially easy.

bluss · 2016-11-25T12:32:22Z

Hah, crazy but interesting. Implementing tombstones is a big step. I'll have to look at this carefully. What are the benchmarks like?

Don't need to do anything if there a no tombstones

Popog · 2016-11-25T22:15:52Z

I've only changed Entry and retain to use tombstones, swap_remove and any benches which test that are unaffected. Overall, very little change. Tested on an i7 4770k:

name	original ns/iter	tombstone ns/iter
entry_hashmap_150	3,209 (± 408)	3,097 (± 60)
entry_orderedmap_150	2,846 (± 56)	3,693 (± 48)
grow_fnv_hashmap_100_000	3,822,576 (± 134,932)	3,862,308 (± 377,264)
grow_fnv_ordermap_100_000	4,027,347 (± 264,053)	5,034,100 (± 735,180)
hashmap_merge_shuffle	1,035,783 (± 49,163)	1,064,027 (± 28,407)
hashmap_merge_simple	34,034,802 (± 445,112)	33,043,395 (± 355,684)
insert_hashmap_100_000	4,467,652 (± 1,126,408)	4,277,086 (± 424,272)
insert_hashmap_10_000	267,512 (± 10,203)	264,952 (± 7,215)
insert_hashmap_150	3,330 (± 63)	3,220 (± 37)
insert_hashmap_int_bigvalue_10_000	644,812 (± 58,041)	646,611 (± 13,821)
insert_hashmap_str_10_000	285,223 (± 5,886)	284,730 (± 6,332)
insert_hashmap_string_10_000	1,229,139 (± 43,086)	1,214,361 (± 22,182)
insert_orderedmap_100_000	3,656,966 (± 2,233,920)	4,410,542 (± 811,701)
insert_orderedmap_10_000	251,699 (± 52,062)	292,366 (± 15,493)
insert_orderedmap_150	3,269 (± 87)	3,831 (± 19)
insert_orderedmap_int_bigvalue_10_000	458,839 (± 16,179)	494,504 (± 13,260)
insert_orderedmap_str_10_000	265,618 (± 5,100)	288,197 (± 2,769)
insert_orderedmap_string_10_000	1,083,413 (± 8,313)	1,112,794 (± 14,495)
iter_black_box_hashmap_10_000	41,517 (± 572)	44,543 (± 773)
iter_black_box_orderedmap_10_000	2,965 (± 24)	5,509 (± 45)
iter_sum_hashmap_10_000	46,132 (± 1,379)	44,069 (± 192)
iter_sum_orderedmap_10_000	2,594 (± 68)	5,465 (± 101)
lookup_hashmap_100_000_inorder_multi	164,799 (± 1,983)	164,095 (± 1,107)
lookup_hashmap_100_000_multi	170,325 (± 2,209)	171,914 (± 5,477)
lookup_hashmap_100_000_single	36 (± 8)	35 (± 1)
lookup_hashmap_10_000_exist	130,540 (± 8,240)	127,333 (± 4,324)
lookup_hashmap_10_000_noexist	135,121 (± 1,772)	130,829 (± 1,057)
lookup_orderedmap_10_000_exist	135,564 (± 2,158)	145,417 (± 1,710)
lookup_orderedmap_10_000_noexist	130,561 (± 2,740)	136,118 (± 3,032)
lookup_ordermap_100_000_inorder_multi	110,872 (± 1,311)	117,463 (± 2,315)
lookup_ordermap_100_000_multi	158,997 (± 3,455)	159,648 (± 1,922)
lookup_ordermap_100_000_single	33 (± 33)	33 (± 10)
new_hashmap	7 (± 0)	7 (± 0)
new_orderedmap	11 (± 0)	14 (± 0)
ordermap_merge_shuffle	646,260 (± 16,298)	977,019 (± 5,665)
ordermap_merge_simple	430,928 (± 36,904)	786,512 (± 7,431)
pop_ordermap_100_000	2,989,304 (± 151,554)	2,588,539 (± 49,431)
remove_ordermap_100_000	6,585,708 (± 447,388)	5,901,486 (± 1,301,382)
retain_few_ordermap_100_000	3,533,252 (± 132,025)	2,266,023 (± 67,881)
retain_half_ordermap_100_000	3,127,368 (± 608,248)	2,035,521 (± 39,373)
retain_many_ordermap_100_000	1,127,635 (± 115,684)	1,299,009 (± 22,572)
with_capacity_10e5_hashmap	5,150 (± 148)	5,115 (± 19)
with_capacity_10e5_orderedmap	5,454 (± 130)	5,485 (± 47)
insert_hashmap_string_10_000	1,257,354 (± 60,714)	1,229,721 (± 7,204)
insert_hashmap_string_oneshot_10_000	1,173,116 (± 67,376)	1,149,863 (± 5,214)
insert_orderedmap_string_10_000	1,093,339 (± 23,124)	1,092,168 (± 13,019)
lookup_hashmap_10_000_exist_string	190,106 (± 2,865)	190,600 (± 2,763)
lookup_hashmap_10_000_exist_string_oneshot	171,826 (± 54,066)	170,334 (± 2,971)
lookup_ordermap_10_000_exist_string	217,790 (± 4,591)	224,351 (± 1,487)
lookup_ordermap_10_000_exist_string_oneshot	197,931 (± 7,179)	192,204 (± 2,247)

I'm going to investigate using the MSB of the hash in place of Option (requires unsafe [either directly or as a dependency]) as I have a hunch that will probably provide some perf benefits.

Contains unsafe code, but it should all be contained in unsafe.rs and guarded by feature flags.

Popog · 2016-12-02T00:37:08Z

Found some bugs and fixed them. Also added some unsafe options under test feature flags. They don't really seem to provide any perf benefit, so I'll probably end up scrapping them.

The CI break seems to be from an SSL error for crates.io trying to pull the tagged union packaged I added as a dependency. Not sure what's up with that but with how things are looking I'll probably end up removing the dependency.

Improved find_existing_entry_mut: No longer requires entries (everything ahead of the entry can just be shifted). Switched to first_tombstone instead of last. Maximized usage of this function. Fixed remove_index_tombstones: fixed backtracking bug which tanked perf. Rewored .entry: merged with insert to reduce code duplication. Reworked probe loop to seperate tombstone code (makes it easier to reason about) Switched to first_tombstone instead of last.

Derp. We don't need a linked list during tombstone removal, tombstones will always be consecutive, thanks to the robin-hood invariant. This also means we can improve tombstone compression in entry_phase_1 and find_exisint_entry_mut. This should provide a minor perf improvement overall, and a more substantial improvement for remove_index_tombstones

Popog · 2016-12-05T21:26:56Z

So, found a couple more bugs, but while I fix those, since I'm already touching so much of the code, how would you feel about the following changes:

Remove self.mask in favor of getting the mask via indices.len() - 1
Removing probe_loop! in favor of loop and probe = (probe + 1) & mask

My justification for the former is, it seems like a bad idea to have duplicate data that could get out of sync. The mask is already implicitly stored in indices and is cheap to recover.

My justification for the later is it reads better than the macro (especially when debugging as line numbers get collapsed to the first line of the macro) and seems to have very little perf cost. Also I somewhat already have to use this approach in some of my tombstone loops.

bluss · 2016-12-06T07:50:40Z

I'm sorry I haven't sat down and looked at this yet. I would hold off on both of those changes.

Remove all the unsafe testing, and unneeded functions. A couple bug fixes. Change swap_remove_found to use tombstones.

Popog · 2016-12-06T17:48:58Z

No problem, I've had to do quite a bit of cleanup before this PR would actually be usable. I've removed all my unsafe testing (it ultimately worked out to a maybe a 5% gain) and switched everything over to tombstones which can benefit from them. Hopefully I've removed all the bugs in my tombstone code 🤞.

bluss · 2017-01-30T10:43:23Z

I'm sorry that I haven't been able to review this. I think tombstones are going to be useful, and maybe we should bench it against linked hash map instead for a more apples to apples comparison. Unsure if we should fork the crate for this, I almost think so if we can't bring the overhead down. Improving retain just to have lookup and insert regress really is throwing the baby out with the bath water for a general purpose hash table.

bluss · 2017-03-31T23:01:06Z

Ok, I'm clearly not fit to help this PR as a maintainer. I would recommend forking the crate and giving the map a different name, because with tombstones you get a different hash table with other nice properties, while removing some of the nice properties of this OrderMap. (Performance and compactly indexed set).

bluss · 2017-09-02T13:48:26Z

@Popog Hey! What do you think about publishing ordermap with full insertion order (using tombstones) as a separate crate?

Popog · 2017-09-23T15:18:34Z

Got a good idea for a name? (I'm the worst at naming things)

bluss · 2017-09-23T15:40:05Z

A. What if current ordermap is renamed to donate its name?
Current ordermap is characterized by (1) consistent hash-independent order (2) compactly indexable map, so what's a good name for it?? Now we have the same problem... maybe "CompactMap"?

B. ordermap stays and the new type with matching crate is "InsertionOrderMap"?

bluss · 2017-09-23T16:08:09Z

compactmap is taken: https://crates.io/crates/compactmap

bluss · 2017-09-24T19:08:19Z

I think this PR would be simpler if it didn't use index tombstones, only entry tombstones. We don't need to keep any index order

Popog · 2017-09-27T16:20:16Z

Yeah, the index tombstones was just an attempt to amoratize other operations. I should have time this weekend to go back a couple commits to just entry tombstones and clean that up.

What about instead of a new crate, just adding it as a another type to this crate?

bluss · 2017-09-27T17:59:42Z

I'd prefer to split the crates if we are not at the point where implementation can be shared. I think all the core probe loops etc will look different, so there will not be so much actual overlap?

There's also the reason that if it's in this crate I'd feel responsible for it, and I don't have time to be responsible for another one unfortunately.

cuviper · 2019-03-28T21:25:38Z

I believe this PR is obsolete since renaming to indexmap.

I think it's been long enough that we could donate / give up the ordermap name now, if that's still desired. Let us know if you want it.

Added Retain Iterator

e82b4ac

.retain() now returns an iterator with the removed items closes indexmap-rs#11 Also cleaned up some match statements.

Tombstones!

6dfde34

Functioning tombstones and some associated work. Also some other things like reserve and removing S from Entry. Ironically removes Retain iterators. Attempts to minimize unwrap usage. Also pointlessly moves a bunch of methods around. Sorry about that.

Popog changed the title ~~Added Retain Iterator~~ Added Tombstones Nov 25, 2016

Early return from remove_tombstone

c1fab6b

Don't need to do anything if there a no tombstones

Some tombstone fixes plus more variants for perf testing

700eba3

Contains unsafe code, but it should all be contained in unsafe.rs and guarded by feature flags.

Popog added 2 commits December 3, 2016 02:17

General cleanup for PR

c225681

Remove all the unsafe testing, and unneeded functions. A couple bug fixes. Change swap_remove_found to use tombstones.

bluss mentioned this pull request Apr 2, 2017

Added ExactSizeIterator support. #26

Merged

bluss mentioned this pull request Sep 2, 2017

OrderMap 1.0 Roadmap #37

Closed

8 tasks

jasonwhite mentioned this pull request Feb 2, 2018

Copy trait restriction on GraphMap nodes petgraph/petgraph#201

Closed

cuviper closed this Mar 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Tombstones #12

Added Tombstones #12

Popog commented Nov 15, 2016

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016 •

edited

Loading

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016 •

edited

Loading

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016 •

edited

Loading

Popog commented Nov 17, 2016 •

edited

Loading

Popog commented Nov 25, 2016

bluss commented Nov 25, 2016

Popog commented Nov 25, 2016 •

edited

Loading

Popog commented Dec 2, 2016

Popog commented Dec 5, 2016 •

edited

Loading

bluss commented Dec 6, 2016

Popog commented Dec 6, 2016 •

edited

Loading

bluss commented Jan 30, 2017

bluss commented Mar 31, 2017

bluss commented Sep 2, 2017

Popog commented Sep 23, 2017

bluss commented Sep 23, 2017

bluss commented Sep 23, 2017

bluss commented Sep 24, 2017

Popog commented Sep 27, 2017

bluss commented Sep 27, 2017

cuviper commented Mar 28, 2019

Added Tombstones #12

Added Tombstones #12

Conversation

Popog commented Nov 15, 2016

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016 • edited Loading

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016 • edited Loading

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016

bluss commented Nov 15, 2016

Popog commented Nov 15, 2016 • edited Loading

Popog commented Nov 17, 2016 • edited Loading

Popog commented Nov 25, 2016

bluss commented Nov 25, 2016

Popog commented Nov 25, 2016 • edited Loading

Popog commented Dec 2, 2016

Popog commented Dec 5, 2016 • edited Loading

bluss commented Dec 6, 2016

Popog commented Dec 6, 2016 • edited Loading

bluss commented Jan 30, 2017

bluss commented Mar 31, 2017

bluss commented Sep 2, 2017

Popog commented Sep 23, 2017

bluss commented Sep 23, 2017

bluss commented Sep 23, 2017

bluss commented Sep 24, 2017

Popog commented Sep 27, 2017

bluss commented Sep 27, 2017

cuviper commented Mar 28, 2019

Popog commented Nov 15, 2016 •

edited

Loading

Popog commented Nov 15, 2016 •

edited

Loading

Popog commented Nov 15, 2016 •

edited

Loading

Popog commented Nov 17, 2016 •

edited

Loading

Popog commented Nov 25, 2016 •

edited

Loading

Popog commented Dec 5, 2016 •

edited

Loading

Popog commented Dec 6, 2016 •

edited

Loading