Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example performance numbers? #1

Open
zslayton opened this issue Mar 7, 2018 · 9 comments
Open

Example performance numbers? #1

zslayton opened this issue Mar 7, 2018 · 9 comments

Comments

@zslayton
Copy link

zslayton commented Mar 7, 2018

Hi, I found this project by way of your Stack Exchange post. I was considering trying something like this myself and would be very interested to know what sort of precision a BK tree index was able to offer at different Hamming distances. Do you have any example performance numbers (e.g. number of records scanned vs. number of records returned) that you could share? Thanks!

@fake-name
Copy link
Owner

fake-name commented Mar 7, 2018

I don't have numbers, currently, but it's considerably slower then the in-memory BK-Tree I wrote.

It mostly only wins on the convenience front, but MAN it's way more convenient, and it's fast enough.

I'm not actually sure how to go about getting detailed stats out of postgres for this sort of thing, actually.


I'm not sure what you mean by "Precision". This isn't an approximate search, for each query there is only one correct return set, and in the testing I've done (see all the Test_db_BKTree_xxx.py files), it returns that.

@yjhjstz
Copy link

yjhjstz commented Apr 26, 2019

have you test Precision and recall in SIFT1M dataset ?

@fake-name
Copy link
Owner

@yjhjstz - I'm not sure what those are, and this isn't a generic fuzzy image searching facility. It's a database-backed implementation of one specific data structure that makes building a fuzzy image search system easy, assuming your image-hashing function outputs 64-bit hash values where similarity is a function of bitwise hamming distance between two hashes.

Given that, it doesn't have a "precision", as any query has one (and only one) valid output result, and from all the tests I've done seem to indicate that you get the right result.

Note that I have built a fuzzy image searching system on top of this, but it's:

  1. Not in this repo (it's here)
  2. It uses a DCT-based perceptual hashing approach (mildly tweaked here) ((described here)[http://hackerfactor.com/blog/index.php%3F/archives/432-Looks-Like-It.html])
  3. Not really finished.

@fake-name
Copy link
Owner

fake-name commented Apr 26, 2019

Looking at the SIFT1M webpage (here?), I'm not sure what the feature vector they're using looks like, but assuming it obeys the triangle inequality, the approach used here would probably work, though the index would need to be considerably more complicated if the vector size is larger then 64 bits.

This index exploits the fact that the pHash values it's designed to index are the native data size of the index pointers, and therefore stores the hash value directly in the index itself. Postgresql can support out-of-line index data, but it's more indirection and a additional lookup for each index comparison operation to do that, so you'd at least double the number of RAM accesses if you made those changes. It'd probably also have poor cache-locality effects.

@yjhjstz
Copy link

yjhjstz commented Apr 26, 2019

thanks , maybe you can try mnist dataset, it has features and label . according to label , precision can be done.

@fake-name
Copy link
Owner

fake-name commented Apr 26, 2019

@yjhjstz - That sounds neat. Let me know if you have performance numbers I can add to the repo.

@yjhjstz
Copy link

yjhjstz commented Apr 27, 2019

not only performance, I also care about precision of top-1.

@fake-name
Copy link
Owner

fake-name commented Apr 27, 2019

Again, this is just a database-backed BK-tree-based index implementation. It has no precision.

@yjhjstz
Copy link

yjhjstz commented Apr 27, 2019

I know , thanks anain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants