Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use SmallVec for better performance #57

Open
jakoschiko opened this issue Aug 4, 2024 · 7 comments
Open

Use SmallVec for better performance #57

jakoschiko opened this issue Aug 4, 2024 · 7 comments

Comments

@jakoschiko
Copy link
Contributor

Currently IntMap stores it values in this data structure:

cache: Vec<Vec<(u64, V)>>

If I understand correctly the inner Vec is used for entries with a hash collision. I would assume that hash collisions are rare, so allocating a Vec for storing a single entry might be too expensive.

A simple solution to this problem could be SmallVec:

cache: Vec<SmallVec<[(u64, V); 1]>>

I refactored IntMap and ran the benchmarks included in this repo:

Vec<Vec<(u64, V)>>

test tests::u64_get_built_in                     ... bench:      75,667.25 ns/iter (+/- 990.66)
test tests::u64_get_indexmap                     ... bench:     100,443.82 ns/iter (+/- 558.04)
test tests::u64_get_intmap                       ... bench:      12,725.52 ns/iter (+/- 15.93)
test tests::u64_insert_built_in                  ... bench:     104,197.52 ns/iter (+/- 293.96)
test tests::u64_insert_built_in_without_capacity ... bench:     263,766.42 ns/iter (+/- 612.20)
test tests::u64_insert_indexmap                  ... bench:     119,844.19 ns/iter (+/- 478.59)
test tests::u64_insert_intmap                    ... bench:      48,210.07 ns/iter (+/- 1,035.84)
test tests::u64_insert_intmap_checked            ... bench:      43,977.34 ns/iter (+/- 344.81)
test tests::u64_insert_intmap_entry              ... bench:      56,815.19 ns/iter (+/- 151.72)
test tests::u64_insert_intmap_without_capacity   ... bench:     534,254.25 ns/iter (+/- 4,754.61)
test tests::u64_resize_intmap                    ... bench:      21,808.31 ns/iter (+/- 157.90)

Vec<SmallVec<[(u64, V); 1]>>

test tests::u64_get_built_in                     ... bench:      78,840.49 ns/iter (+/- 1,060.83)
test tests::u64_get_indexmap                     ... bench:     104,876.86 ns/iter (+/- 601.68)
test tests::u64_get_intmap                       ... bench:      13,770.94 ns/iter (+/- 31.64)
test tests::u64_insert_built_in                  ... bench:     109,268.14 ns/iter (+/- 405.67)
test tests::u64_insert_built_in_without_capacity ... bench:     262,595.65 ns/iter (+/- 2,398.98)
test tests::u64_insert_indexmap                  ... bench:     121,934.15 ns/iter (+/- 362.42)
test tests::u64_insert_intmap                    ... bench:      44,227.69 ns/iter (+/- 104.81)
test tests::u64_insert_intmap_checked            ... bench:      44,575.48 ns/iter (+/- 76.96)
test tests::u64_insert_intmap_entry              ... bench:      57,808.73 ns/iter (+/- 137.58)
test tests::u64_insert_intmap_without_capacity   ... bench:     535,978.00 ns/iter (+/- 3,435.93)
test tests::u64_resize_intmap                    ... bench:      21,471.73 ns/iter (+/- 152.39)

It doesn't seem to make a difference. But then I ran the benchmarks mentioned #46:

Vec<Vec<(u64, V)>>

test bench_hash_map ... bench:   1,134,233.35 ns/iter (+/- 57,827.94)
test bench_int_map  ... bench:   1,927,631.20 ns/iter (+/- 22,938.10)

Vec<SmallVec<[(u64, V); 1]>>

test bench_hash_map ... bench:   1,145,045.10 ns/iter (+/- 157,035.95)
test bench_int_map  ... bench:   1,019,778.05 ns/iter (+/- 11,999.06)

This looks good, isn't it? My changes are on this branch.

What do you think? Is it worth to investigate this further?

@JesperAxelsson
Copy link
Owner

JesperAxelsson commented Aug 9, 2024

Could definitely be worth looking into. There is an collisions function that can be used to track of collisions, load rate is factor here as well. There will almost always be some collisions though.
One thing to keep in mind is that I found the benchmarks to sometimes change between runs so it is wise to run it a couple of times to get a baseline. Changing the element count makes the bench mark more stable but take more time. Preferably there is benchmarks for different element counts, but I'm too lazy ^^

@jakoschiko
Copy link
Contributor Author

Could definitely be worth looking into.

I will do it. Maybe it's worth to add more benchmarks before that. How do you feel about introducing the dependency smallvec?

One thing to keep in mind is that I found the benchmarks to sometimes change between runs so it is wise to run it a couple of times to get a baseline. Changing the element count makes the bench mark more stable but take more time.

I ran it multiple times and it was consistent.

Preferably there is benchmarks for different element counts, but I'm too lazy ^^

Would you mind switching to another bench lib like criterion or divan (my preferred choice)? They support this out of the box. And the current benchmark requires a nightly compiler which is a little bit annoying.

@JesperAxelsson
Copy link
Owner

Looked a bit at both of the benchmark libs. Both don't seems very active so I suppose divan should be fine.
I would be fine with adding a dependency in this case.

@jakoschiko
Copy link
Contributor Author

I'm confused because my optimization improved the benchmark from issue #64, but not the existing benchmark u64_insert_intmap_without_capacity.

So the I compared the collisions in both benchmarks:

use intmap::IntMap;

pub fn main() {
    // Based on the benchmark from issue #46
    let mut map = IntMap::new();
    for i in 0..10_000 {
        map.insert(i, i);
    }
    println!("A: {:?}", map.collisions());

    // Based on benchmark u64_insert_intmap_without_capacity
    let data = get_random_range(10_000);
    let mut map = IntMap::new();
    for s in data.iter() {
        map.insert(*s, s);
    }
    println!("B: {:?}", map.collisions());
}

fn get_random_range(count: usize) -> Vec<u64> {
    use rand::prelude::StdRng;
    use rand::{Rng, SeedableRng};

    let mut vec = Vec::new();
    let mut rng = StdRng::seed_from_u64(10);

    for _ in 0..count {
        vec.push(rng.gen::<u64>());
    }

    vec.sort();
    vec.dedup();

    vec
}

Here the result:

A: {}
B: {4: 43, 5: 7, 2: 1675, 3: 333}

When using random values, there are 1675 collisions for slot 2! What's going on? Does the random number generator use the same prime number as our hashing function?

@jakoschiko
Copy link
Contributor Author

Ah, I misinterpreted the output:

  • There are 7 slots with 5 collisions
  • There are 43 slots with 4 collisions
  • There are 333 slots with 3 collisions
  • There are 1675 slots with 2 collisions

@JesperAxelsson
Copy link
Owner

If smallvec could fit at least 2 items it should be a decent boon to performance.

@jakoschiko
Copy link
Contributor Author

jakoschiko commented Nov 2, 2024

Unfortunately it did not improve the performance according to my benchmarks. I need more time for investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants