Deletes #117

jbellis · 2023-10-09T13:37:14Z

Adds markNodeDeleted to GraphIndexBuilder, and adds functionality to cleanup() that removes deleted nodes and repairs the connections of their neighbors.

Fixes #115

Bits parameter to search() is not required to be non-null Add Bits.ALL and convenience methods No cleanup yet

…leaned outc

…for duplicates

- NN path can include self, so exclude it from candidates array

jbellis · 2023-10-09T13:39:47Z

Not yet done:

Tests that exercise RenumberingGraphIndex
A way to communicate to the caller how their vectors got renumbered
(Optional) integrate this with other renumbering the caller may want to perform (e.g. Cassandra prefers to have vector ids match row ids when there is a 1:1 correspondence)

jbellis · 2023-10-09T13:41:07Z

PR is a bit of a mess, largely because I started writing it on my desktop, wrote a different part on my macbook, and then had to merge both parts with the big threadlocal refactor, so apologies in advance for that.

jbellis · 2023-10-12T14:09:30Z

A way to communicate to the caller how their vectors got renumbered

I think there's actually two different use cases.

The first is Cassandra-like, where the caller wants to make the vector ordinals match some other data source (C* rowIds). For that we'll have the caller pass a a mapping function.

But if you're building, say, vector embeddings of the files in an intellj project, you don't care about that, you just need to map vectors to files or methods. And there isn't really a state where the index is "done," you're going to keep modifying it as edits are made to the files. In that case it's a better fit to save the index, "holes" and all, and be able to reload it in the same state to continue modifying it.

The most recent commit here adds save and load methods to handle this.

jvector-base/src/main/java/io/github/jbellis/jvector/graph/ConcurrentNeighborSet.java

jkni

Sorry, clicked approve on the wrong PR tab. Still reviewing.

jkni

Approach and general implementation make sense to me. I left a few more comments inline about things I noticed on my last review pass. I think resolving these with words or code would conclude my review.

jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphIndexBuilder.java

jvector-base/src/main/java/io/github/jbellis/jvector/disk/OnDiskGraphIndex.java

…instead of old

jkni

one tiny nit and one small question, but LGTM. Thanks!

jvector-base/src/main/java/io/github/jbellis/jvector/disk/OnDiskGraphIndex.java

jkni · 2023-10-19T20:20:26Z

jvector-tests/src/test/java/io/github/jbellis/jvector/disk/TestOnDiskGraphIndex.java

+    }
+
+    @Test
+    public void testReordingReumbering() throws IOException {


rename? Not sure how much of this is a typo but at least Reumbering -> Renumbering

wow, two typos in the same method name. I don't think I was drunk when I wrote this ... :)

Since jbellis#117, acceptOrds should not be null. Instead, Bits.ALL should be used. Also updated GraphSearcher#search javadoc to match implementation.

Since #117, acceptOrds should not be null. Instead, Bits.ALL should be used. Also updated GraphSearcher#search javadoc to match implementation.

siddhsql · 2023-12-16T00:13:19Z

does this use the same algorithm for deletes as in https://arxiv.org/abs/2105.09613 or something different? looking at it, it seems to be doing something different. could you explain?

jbellis · 2023-12-16T00:26:22Z

JVector's approach is essentially "pretend each node that loses connections due to the deletion is newly added to the graph and rebuild its connections that way."

FreshDiskANN's approach is definitely less expensive, if it actually maintains link quality I'd be happy to switch to that.

siddhsql · 2023-12-20T20:43:49Z

i tried deleting nodes from the graph. the cleanup operation takes more time than rebuilding the entire index. with 1M vectors, building the index took 3 minutes. then i deleted 20% of the vectors and the cleanup took 17 minutes.

jbellis · 2023-12-20T21:28:21Z

That does sound excessive.

jbellis added 21 commits October 6, 2023 07:22

rename

4545399

new node value won't change, pull it out of loop

596d133

TODO

28db65e

wip

430f057

Add markNodeDeleted

4c81c1e

Bits parameter to search() is not required to be non-null Add Bits.ALL and convenience methods No cleanup yet

remove View.getSortedNodes

3b22f17

wip

73525a1

merge

a7e258e

merge and get it building

9ec01ff

formatting

b0f21d3

replace validateGraph with assertGraphEquals

38bd370

clean out vestigial document cruft from mock vectorvalues

118c5b1

format

4600ccf

r/m numVectors field (always equal to array length)

18b99db

createRandom[]Vectors no longer leaves null entries that need to be c…

f37bd49

…leaned outc

formatting

b7db4a3

first test for deletions

2cec2e2

wiring in the purge. almost passes tests

3abdfa9

fix mergeNeighbors to not add duplicate nodes, and fix test to check …

35bf868

…for duplicates

- fix removeDeletedNeighbors

cb6ac31

- NN path can include self, so exclude it from candidates array

- fix removeDeletedNeighbors

557d9d6

- NN path can include self, so exclude it from candidates array

jbellis added 7 commits October 10, 2023 20:43

merge from main

291aefe

finish implementing renumbering for writes

9437684

rename nsize0 -> maxDegree

e05cd0c

show input vectors when assert fails

5cf1d74

re-use buildSequentially

feec7f7

encapsulate OHGI better

cc33203

instead of renumbering implicitly, let caller provide remapper

31a54e9

jbellis self-assigned this Oct 12, 2023

Merge remote-tracking branch 'origin/main' into deletes

15003b0

jbellis marked this pull request as ready for review October 12, 2023 14:10

jbellis mentioned this pull request Oct 13, 2023

Is there a way to build an index while keeping it on disk (GraphIndexBuilder + OnDiskGraphIndex ?) #125

Closed

jkni approved these changes Oct 13, 2023

View reviewed changes

jvector-base/src/main/java/io/github/jbellis/jvector/graph/ConcurrentNeighborSet.java Show resolved Hide resolved

jvector-base/src/main/java/io/github/jbellis/jvector/graph/ConcurrentNeighborSet.java Outdated Show resolved Hide resolved

jkni requested changes Oct 13, 2023

View reviewed changes

jbellis added 3 commits October 13, 2023 09:51

r/m unused CNS.insert method with confusing semantics

d01d737

Merge remote-tracking branch 'origin/deletes' into deletes

14ace18

fix insertDiverse ignoring current neighbors

be714fd

jkni requested changes Oct 13, 2023

View reviewed changes

jbellis added 7 commits October 13, 2023 17:47

ram freed is proportional to nodes removed

bdda8a2

merge ConcurrentNeighborArray into NeighborArray

952fe1a

fix node-present check

fea613b

make getSequentialRenumbering public

bf989c1

add failing testRenumberingOnDelete

046c799

refactor to take Map instead of Function; sort writes by new ordinal …

ff77ff5

…instead of old

fix ci bitching about javadoc

2e0c63f

jkni approved these changes Oct 19, 2023

View reviewed changes

jbellis added 2 commits October 19, 2023 17:09

fix typos

a0fa7ac

merge

7020849

jbellis merged commit a3edcbe into main Oct 19, 2023
8 checks passed

jbellis deleted the deletes branch October 19, 2023 23:15

vbekiaris mentioned this pull request Nov 14, 2023

Fix usage of null acceptOrds in SiftSmall example #152

Merged

jbellis pushed a commit that referenced this pull request Nov 15, 2023

Fix usage of null acceptOrds in SiftSmall example (#152)

1a73b56

Since #117, acceptOrds should not be null. Instead, Bits.ALL should be used. Also updated GraphSearcher#search javadoc to match implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deletes #117

Deletes #117

jbellis commented Oct 9, 2023

jbellis commented Oct 9, 2023

jbellis commented Oct 9, 2023

jbellis commented Oct 12, 2023

jkni left a comment

jkni left a comment

jkni left a comment

jkni Oct 19, 2023

jbellis Oct 19, 2023

siddhsql commented Dec 16, 2023

jbellis commented Dec 16, 2023

siddhsql commented Dec 20, 2023

jbellis commented Dec 20, 2023

Deletes #117

Deletes #117

Conversation

jbellis commented Oct 9, 2023

jbellis commented Oct 9, 2023

jbellis commented Oct 9, 2023

jbellis commented Oct 12, 2023

jkni left a comment

Choose a reason for hiding this comment

jkni left a comment

Choose a reason for hiding this comment

jkni left a comment

Choose a reason for hiding this comment

jkni Oct 19, 2023

Choose a reason for hiding this comment

jbellis Oct 19, 2023

Choose a reason for hiding this comment

siddhsql commented Dec 16, 2023

jbellis commented Dec 16, 2023

siddhsql commented Dec 20, 2023

jbellis commented Dec 20, 2023