Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not the largest prime? #68

Closed
jakoschiko opened this issue Oct 31, 2024 · 4 comments · Fixed by #69
Closed

Not the largest prime? #68

jakoschiko opened this issue Oct 31, 2024 · 4 comments · Fixed by #69

Comments

@jakoschiko
Copy link
Contributor

According to the readme, 11400714819323198549u64 is the largest prime for u64. It's also used in the implementation. But 18446744073709551557u64 is larger and also a prime.

Source: https://t5k.org/lists/2small/0bit.html

@JesperAxelsson
Copy link
Owner

I might have accidentally used the biggest largest prime for i64 🤦

@JesperAxelsson JesperAxelsson changed the title No the largest prime? Not the largest prime? Nov 2, 2024
@jakoschiko
Copy link
Contributor Author

No, that would be i64::MAX - 24 or 9223372036854775783.

According to some online search, 11400714819323198549 is a common prime for some algorithms. E.g. see this:

// Return the hash code for the accumulated hash state.
size_t _GetCode() const {
    // This is based on Knuth's multiplicative hash for integers.  The
    // constant is the closest prime to the binary expansion of the inverse
    // golden ratio.  The best way to produce a hash table bucket index from
    // the result is to shift the result right, since the higher order bits
    // have the most entropy.  But since we can't know the number of buckets
    // in a table that's using this, we just reverse the byte order instead,
    // to get the highest entropy bits into the low-order bytes.
    return _SwapByteOrder(_state * 11400714819323198549ULL);
}

Why do you think that using the largest prime is optimal? Is there any source for that?

@JesperAxelsson
Copy link
Owner

I read about it on stackoverflow a long time ago. Unfortunately I forgot to save the link and couldn't find it again. Either way, testing and making sure the max collisions doesn't grow to big should be sufficient.

It should be possible to track the number of active buckets pretty easily. Not sure if it would be worth it though.

@jakoschiko
Copy link
Contributor Author

The PR #69 already changed the prime number for u64. The tests passed and the benchmarks had no obvious regression. I would say this issue here can be closed by #69.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants