-
Notifications
You must be signed in to change notification settings - Fork 1
murmur2 vs murmur3
I integrated murmur3 as a compile time (-DMURMUR3) option to vw.
murmur3 brings a slight ~3% speed advantage in training time vs murmur2.
I used the data-sets in the test suite. Most data-sets don't have collisions with neither hash.
I used --exact_adaptive_norm and varied the -b bits parameter from as low as 12 to stress the hash and induce collisions.
I found that on most data-sets I tried, murmur3 had a slight advantage in hash collision avoidance.
One exception was 0002.dat where murmur2 achieves no-colisions @ -b 18 (the default) while murmur3 achieves no-collisions only at -b 21.
| DataSet | #-features | Winner (dominates on most -b bits range) | | 0001.dat | 4290 | Murmur3 | | 0002.dat | 289 | Murmur2 | | rcv1_small.dat | 23530 | Murmur3 | | wsj_small.dat | 13762 | Murmur3 | | ner.train | 292497 | Murmur3 |