Support clustering of vectors #31

mountain · 2014-06-11T12:14:09Z

When dealing with the scenario of personalized recommendation, the user profile vector set usually are very large, for example ~10m vectors or even above. We take 10m vectors as a baseline, because it still possible to store all the 10m data into one physical machine.

10m vectors * 2048 dimensions * 4 byte float = 80 G memory

Current solution does not fit into the level, because write latency would be ~30s which is not acceptable.

One idea is that: we do not recommend for a single users, but for a cluster of similar users.

Two choices: online KMeans or SimHash?

mountain added the enhancement label Jun 11, 2014

mountain added this to the 0.2.0 milestone Jun 11, 2014

mountain added the discussion label Jun 11, 2014

mountain changed the title ~~Support kmeans clustering~~ Support vector clustering Jun 14, 2014

mountain changed the title ~~Support vector clustering~~ Support clustering of vectors Jun 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support clustering of vectors #31

Support clustering of vectors #31

mountain commented Jun 11, 2014

Support clustering of vectors #31

Support clustering of vectors #31

Comments

mountain commented Jun 11, 2014