Duplicate vector handling #39

fjsj · 2020-12-31T00:59:16Z

Hi, thanks for open sourcing this great library!
This is more a question than an actual issue: does n2 handles duplicate vectors without performance degradation or recall issues?

Other ANN libraries tend to suffer with that. See:

gony-noreply · 2020-12-31T04:39:25Z

does n2 handles duplicate vectors without performance degradation or recall issues?

Yes, since version 0.1.7

The HNSW algorithm doesn't work efficiently on duplicate vectors. We thought this was because the heuristic neighbor selection algorithm focused only on navigation. With the heuristic neighbor selection, duplicate or near-duplicate vectors are hidden and
search becomes difficult, resulting in a low recall.

To solve this, we modified the heuristic neighbor selection algorithm and improved it in a form that has some nearest neighbors but does not degrade navigation performance.

Below is one of the benchmarks measured for the 0.1.7 release, and GIST has duplicate vectors(about 2% of train vectors are duplicated)

You can see a high recall compared to N2 version 0.1.6

Handling duplicate vectors have a tradeoff relationship with navigation performance, the way we handled it may not be optimal. So we are continuing to work to find if there is a better way.

fjsj · 2020-12-31T12:57:25Z

It's awesome that you're tackling this problem, thank you very much for the detailed response. Please feel free to close the issue if you wish.

gony-noreply · 2021-01-15T02:51:11Z

If we found another achievement for that problem, I'll comment here.

gony-noreply closed this as completed Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate vector handling #39

Duplicate vector handling #39

fjsj commented Dec 31, 2020

gony-noreply commented Dec 31, 2020

fjsj commented Dec 31, 2020

gony-noreply commented Jan 15, 2021

Duplicate vector handling #39

Duplicate vector handling #39

Comments

fjsj commented Dec 31, 2020

gony-noreply commented Dec 31, 2020

fjsj commented Dec 31, 2020

gony-noreply commented Jan 15, 2021