Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A dubt #8

Open
ogreyesp opened this issue Mar 29, 2016 · 1 comment
Open

A dubt #8

ogreyesp opened this issue Mar 29, 2016 · 1 comment

Comments

@ogreyesp
Copy link

Hello,

I'm using the java-LSH code, I consider that it is a great project.

LSH is a technique for handling high-dimensional datasets, for instance datasets that have 100000 features, or even more...

When I run the examples SuperBitExample, SuperBitSparseExample or LSHSuperBitExample, I note that they run OK. However, if I increase the number of dimension, for instance I put the number of dimension to 1000, then the speed of the program is very very slow.

Can I use this project for working with datasets that have high-dimensionality.?

Best regards,

Oscar

@tdebatty
Copy link
Owner

tdebatty commented Apr 6, 2016

Hi,

Sorry for the late answer...

LSH is able to work with high-dimensional datasets but (for signature size S and D dimenstions):

  • computing a single signature has a computation cost O(S.D), so this is slow
  • SuperBit has to make the reference vectors orthogonal, which is also slow

At the other side, computing the similarity between signatures is very fast, which makes LSH suitable for large datasets, even with high dimensions...

I might add a computation time analysis one of these days to make this clear...

Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants