A dubt #8

ogreyesp · 2016-03-29T07:13:37Z

Hello,

I'm using the java-LSH code, I consider that it is a great project.

LSH is a technique for handling high-dimensional datasets, for instance datasets that have 100000 features, or even more...

When I run the examples SuperBitExample, SuperBitSparseExample or LSHSuperBitExample, I note that they run OK. However, if I increase the number of dimension, for instance I put the number of dimension to 1000, then the speed of the program is very very slow.

Can I use this project for working with datasets that have high-dimensionality.?

Best regards,

Oscar

tdebatty · 2016-04-06T19:50:12Z

Hi,

Sorry for the late answer...

LSH is able to work with high-dimensional datasets but (for signature size S and D dimenstions):

computing a single signature has a computation cost O(S.D), so this is slow
SuperBit has to make the reference vectors orthogonal, which is also slow

At the other side, computing the similarity between signatures is very fast, which makes LSH suitable for large datasets, even with high dimensions...

I might add a computation time analysis one of these days to make this clear...

Best regards,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A dubt #8

A dubt #8

ogreyesp commented Mar 29, 2016

tdebatty commented Apr 6, 2016

A dubt #8

A dubt #8

Comments

ogreyesp commented Mar 29, 2016

tdebatty commented Apr 6, 2016