-
Notifications
You must be signed in to change notification settings - Fork 1
murmur2 vs murmur3
I integrated murmur3 as a compile time (-DMURMUR3) option to vw.
murmur3 brings a slight ~3% speed advantage in training time vs murmur2.
I used the data-sets in the test suite. Most data-sets don't have collisions with neither hash.
I used --exact_adaptive_norm and varied the -b bits parameter from as low as 12 to stress the hash and induce collisions.
I found that on most data-sets I tried, murmur3 had a slight advantage in hash collision avoidance.
One exception was 0002.dat where murmur2 achieves no-colisions @ -b 18 (the default) while murmur3 achieves no-collisions only at -b 21. The 2 colliding feature-names were short.
Feature Hash Value
T^BKF 10281
T^IWR 10281
T^DXJ 100053
T^GML 100053
Note: changing the seed value (hash_base constant in hash.h) from 97562527 to 0 avoids these 2 collisions. In the final commited version I switched to seed 0. Most of the results below are with seed 97562527.
#-of-collisons
@ -b 18 (default) Winning hash method
DataSet #-features M2 M3 (dominates most -b 12..30 range)
0001.dat 4290 0 0 Murmur3
0002.dat 289 0 2 Murmur2
zero.dat 1032 0 0 same
wiki1K.dat 6330 0 0 same
rcv1_small.dat 23530 1 0 Murmur3
wsj_small.dat 13762 383 334 Murmur3
ner.train 292497 116204 116108 Murmur3
Having less collisions doesn't always lead to lower training errors. I found some evidence among our test-suite examples that random collisions may sometime act as some kind of regularization and the surprising result is a lower average loss at the end.
Note, some charts have clipped X and/or Y ranges since otherwise the difference is too small to easily see. I did run all test on all the 12 .. 30 bit range.
Also, not all charts have the same comparison criterion. Criterion was chosen to emphasize difference.