Use `SmallVec` for better performance #57

jakoschiko · 2024-08-04T22:17:25Z

Currently IntMap stores it values in this data structure:

cache: Vec<Vec<(u64, V)>>

If I understand correctly the inner Vec is used for entries with a hash collision. I would assume that hash collisions are rare, so allocating a Vec for storing a single entry might be too expensive.

A simple solution to this problem could be SmallVec:

cache: Vec<SmallVec<[(u64, V); 1]>>

I refactored IntMap and ran the benchmarks included in this repo:

Vec<Vec<(u64, V)>>

test tests::u64_get_built_in                     ... bench:      75,667.25 ns/iter (+/- 990.66)
test tests::u64_get_indexmap                     ... bench:     100,443.82 ns/iter (+/- 558.04)
test tests::u64_get_intmap                       ... bench:      12,725.52 ns/iter (+/- 15.93)
test tests::u64_insert_built_in                  ... bench:     104,197.52 ns/iter (+/- 293.96)
test tests::u64_insert_built_in_without_capacity ... bench:     263,766.42 ns/iter (+/- 612.20)
test tests::u64_insert_indexmap                  ... bench:     119,844.19 ns/iter (+/- 478.59)
test tests::u64_insert_intmap                    ... bench:      48,210.07 ns/iter (+/- 1,035.84)
test tests::u64_insert_intmap_checked            ... bench:      43,977.34 ns/iter (+/- 344.81)
test tests::u64_insert_intmap_entry              ... bench:      56,815.19 ns/iter (+/- 151.72)
test tests::u64_insert_intmap_without_capacity   ... bench:     534,254.25 ns/iter (+/- 4,754.61)
test tests::u64_resize_intmap                    ... bench:      21,808.31 ns/iter (+/- 157.90)

Vec<SmallVec<[(u64, V); 1]>>

test tests::u64_get_built_in                     ... bench:      78,840.49 ns/iter (+/- 1,060.83)
test tests::u64_get_indexmap                     ... bench:     104,876.86 ns/iter (+/- 601.68)
test tests::u64_get_intmap                       ... bench:      13,770.94 ns/iter (+/- 31.64)
test tests::u64_insert_built_in                  ... bench:     109,268.14 ns/iter (+/- 405.67)
test tests::u64_insert_built_in_without_capacity ... bench:     262,595.65 ns/iter (+/- 2,398.98)
test tests::u64_insert_indexmap                  ... bench:     121,934.15 ns/iter (+/- 362.42)
test tests::u64_insert_intmap                    ... bench:      44,227.69 ns/iter (+/- 104.81)
test tests::u64_insert_intmap_checked            ... bench:      44,575.48 ns/iter (+/- 76.96)
test tests::u64_insert_intmap_entry              ... bench:      57,808.73 ns/iter (+/- 137.58)
test tests::u64_insert_intmap_without_capacity   ... bench:     535,978.00 ns/iter (+/- 3,435.93)
test tests::u64_resize_intmap                    ... bench:      21,471.73 ns/iter (+/- 152.39)

It doesn't seem to make a difference. But then I ran the benchmarks mentioned #46:

Vec<Vec<(u64, V)>>

test bench_hash_map ... bench:   1,134,233.35 ns/iter (+/- 57,827.94)
test bench_int_map  ... bench:   1,927,631.20 ns/iter (+/- 22,938.10)

Vec<SmallVec<[(u64, V); 1]>>

test bench_hash_map ... bench:   1,145,045.10 ns/iter (+/- 157,035.95)
test bench_int_map  ... bench:   1,019,778.05 ns/iter (+/- 11,999.06)

This looks good, isn't it? My changes are on this branch.

What do you think? Is it worth to investigate this further?

The text was updated successfully, but these errors were encountered:

JesperAxelsson · 2024-08-09T18:49:03Z

Could definitely be worth looking into. There is an collisions function that can be used to track of collisions, load rate is factor here as well. There will almost always be some collisions though.
One thing to keep in mind is that I found the benchmarks to sometimes change between runs so it is wise to run it a couple of times to get a baseline. Changing the element count makes the bench mark more stable but take more time. Preferably there is benchmarks for different element counts, but I'm too lazy ^^

jakoschiko · 2024-08-11T19:30:59Z

Could definitely be worth looking into.

I will do it. Maybe it's worth to add more benchmarks before that. How do you feel about introducing the dependency smallvec?

One thing to keep in mind is that I found the benchmarks to sometimes change between runs so it is wise to run it a couple of times to get a baseline. Changing the element count makes the bench mark more stable but take more time.

I ran it multiple times and it was consistent.

Preferably there is benchmarks for different element counts, but I'm too lazy ^^

Would you mind switching to another bench lib like criterion or divan (my preferred choice)? They support this out of the box. And the current benchmark requires a nightly compiler which is a little bit annoying.

JesperAxelsson · 2024-08-15T20:39:46Z

Looked a bit at both of the benchmark libs. Both don't seems very active so I suppose divan should be fine.
I would be fine with adding a dependency in this case.

jakoschiko · 2024-10-05T20:59:12Z

I'm confused because my optimization improved the benchmark from issue #64, but not the existing benchmark u64_insert_intmap_without_capacity.

So the I compared the collisions in both benchmarks:

use intmap::IntMap;

pub fn main() {
    // Based on the benchmark from issue #46
    let mut map = IntMap::new();
    for i in 0..10_000 {
        map.insert(i, i);
    }
    println!("A: {:?}", map.collisions());

    // Based on benchmark u64_insert_intmap_without_capacity
    let data = get_random_range(10_000);
    let mut map = IntMap::new();
    for s in data.iter() {
        map.insert(*s, s);
    }
    println!("B: {:?}", map.collisions());
}

fn get_random_range(count: usize) -> Vec<u64> {
    use rand::prelude::StdRng;
    use rand::{Rng, SeedableRng};

    let mut vec = Vec::new();
    let mut rng = StdRng::seed_from_u64(10);

    for _ in 0..count {
        vec.push(rng.gen::<u64>());
    }

    vec.sort();
    vec.dedup();

    vec
}

Here the result:

A: {}
B: {4: 43, 5: 7, 2: 1675, 3: 333}

When using random values, there are 1675 collisions for slot 2! What's going on? Does the random number generator use the same prime number as our hashing function?

jakoschiko · 2024-10-05T21:17:52Z

Ah, I misinterpreted the output:

There are 7 slots with 5 collisions
There are 43 slots with 4 collisions
There are 333 slots with 3 collisions
There are 1675 slots with 2 collisions

JesperAxelsson · 2024-11-02T12:43:07Z

If smallvec could fit at least 2 items it should be a decent boon to performance.

jakoschiko · 2024-11-02T15:27:57Z

Unfortunately it did not improve the performance according to my benchmarks. I need more time for investigation.

jakoschiko mentioned this issue Aug 16, 2024

Use benchmark lib that doesn't require nightly Rust #62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `SmallVec` for better performance #57

Use `SmallVec` for better performance #57

jakoschiko commented Aug 4, 2024

JesperAxelsson commented Aug 9, 2024 •

edited

Loading

jakoschiko commented Aug 11, 2024

JesperAxelsson commented Aug 15, 2024

jakoschiko commented Oct 5, 2024

jakoschiko commented Oct 5, 2024

JesperAxelsson commented Nov 2, 2024

jakoschiko commented Nov 2, 2024 •

edited

Loading

Use SmallVec for better performance #57

Use SmallVec for better performance #57

Comments

jakoschiko commented Aug 4, 2024

JesperAxelsson commented Aug 9, 2024 • edited Loading

jakoschiko commented Aug 11, 2024

JesperAxelsson commented Aug 15, 2024

jakoschiko commented Oct 5, 2024

jakoschiko commented Oct 5, 2024

JesperAxelsson commented Nov 2, 2024

jakoschiko commented Nov 2, 2024 • edited Loading

Use `SmallVec` for better performance #57

Use `SmallVec` for better performance #57

JesperAxelsson commented Aug 9, 2024 •

edited

Loading

jakoschiko commented Nov 2, 2024 •

edited

Loading