Skip to content
arielf edited this page Aug 12, 2012 · 19 revisions

I integrated murmur3 as a compile time (-DMURMUR3) option to vw.

Results:

Speed:

murmur3 brings a slight ~3% speed advantage in training time vs murmur2.

Collision avoidance / dispersion

I used the data-sets in the test suite. Most data-sets don't have collisions with neither hash.

I used --exact_adaptive_norm and varied the -b bits parameter from as low as 12 to stress the hash and induce collisions.

I found that on most data-sets I tried, murmur3 had a slight advantage in hash collision avoidance.

One exception was 0002.dat where murmur2 achieves no-colisions @ -b 18 (the default) while murmur3 achieves no-collisions only at -b 21.

Data-sets tested with number of features

| DataSet | #-features | Winner (dominates on most -b bits range) | | 0001.dat | 4290 | Murmur3 | | 0002.dat | 289 | Murmur2 | | rcv1_small.dat | 23530 | Murmur3 | | wsj_small.dat | 13762 | Murmur3 | | ner.train | 292497 | Murmur3 |

Charts