[Feat] Add support of logical merge in Cagra #713

rhdong · 2025-02-20T20:14:43Z

[FEA] Add support of logical merge in Cagra #712

rhdong · 2025-02-20T21:43:41Z

cpp/include/cuvs/neighbors/cagra.hpp

+ * @param[in] sample_filter an optional device filter function object that greenlights samples
+ * for a given query. (none_sample_filter for no filtering)
+ */
+void search(raft::resources const& res,


Hi @cjnolet @achirkin , I had to add a new search for composite_index, the root cause is the cuve::neighbors::index is not designed to be an abstract virtual class, which makes many pathways impossible, like inheriting from index, which can avoid declaring a standalone composible_index and a new search API. I was trying to refactor it, but I found it was used widely in different algo codes than give up under a tight timeline; maybe we can make it in the long term.

I think it's okay that we don't require a strict class hierarchy initially, so long as we have proper overloads so that from a user's perspective, they can call auto index = cuvs::neighbors::cagra::merge(...) and get back a proper object from which they can later call cuvs::neighbors::cagra::search(..., index,..)

achirkin · 2025-02-21T09:05:34Z

I may have missed some prior discussion, but the cagra-specific logical merge seems a little bit artificial for me.
If I understand this correctly, you essentially implement a multi-index search. Then, why don't we go one step further and make it independent of a particular index implementation? Moreover, it looks like much of this functionality is implemented already in multi-gpu index. Maybe we can generalize that one a bit to decouple "multi-index" from "multi-gpu" aspect? Optionally, one can also combine different index types and erase the upstream index type like we do in dynamic batching index.

cjnolet · 2025-02-21T16:28:30Z

I may have missed some prior discussion, but the cagra-specific logical merge seems a little bit artificial for me.

@achirkin you and I haven't discussed this yet but this feature is critical for certain database architectures like Lucene, which are variants of the log-structured-merge pattern but build indexes immediately after segments (flat files containing vectors) are created, rather than merging the segments together before building the indexes. This causes Lucene to have to first merge many tiny CAGRA indexes together, but eventually it'll end up merging very large indexes together. It is this latter case thart we care about doing a logical merge. It's more efficient to create a single cagra index with many (potentially hundreds) of tiny cagra indexes but when they reach a certain size, it's actually more efficient to logically merge them; meaning we essentially wrap them as if they were shards and broadcast the query to all the shards during search.

I agree this is similar in theory to the multi-gpu sharding and perhaps there is some code to be reused there. The next step for the merge() API is to be able to implement a SMART merge strategy, whereby we can more efficiently merge two cagra graphs together without having to rebuild from scratch, and that's ultimately the strategy we have discussed offline in more detail. The problem is that Lucene can make use of this today, so it gives us time to work towards the SMART option.

Then, why don't we go one step further and make it independent of a particular index implementation?

I do agree with this- it doesn't have to be (and ideally wouldn't be) specifically for CAGRA, though that just happens to be the index we care about today because it unblocks Lucene.

achirkin · 2025-02-26T16:01:00Z

I totally agree and don't doubt the usefulness of logical merge.
I've just pointed out that I think we can accomplish a more general solution (supporting any index type) with actually less code by re-using/adjusting what we already have in multi-index/dynamic batching.

rhdong · 2025-03-13T00:13:52Z

I may have missed some prior discussion, but the cagra-specific logical merge seems a little bit artificial for me. If I understand this correctly, you essentially implement a multi-index search. Then, why don't we go one step further and make it independent of a particular index implementation? Moreover, it looks like much of this functionality is implemented already in multi-gpu index. Maybe we can generalize that one a bit to decouple "multi-index" from "multi-gpu" aspect? Optionally, one can also combine different index types and erase the upstream index type like we do in dynamic batching index.

Hi @achirkin , sorry for the late response to your comments here. I studied the code of SNMG, which is a super useful feature. When I tried to reuse some code in it, I found the caller has to work with locker, nccl, which is unnecessary in the logical merge of CAGRA. I'd like to keep the simple implementation if you agree. Many thanks! cc: @cjnolet

rhdong · 2025-03-18T23:02:50Z

Hi @cjnolet @achirkin, could you please merge it if there are no more comments? Many thanks!

[Feat] Add support of logical merge in Cagra

cb8f16b

rhdong added feature request New feature or request non-breaking Introduces a non-breaking change labels Feb 20, 2025

rhdong requested review from cjnolet and achirkin February 20, 2025 20:14

rhdong requested a review from a team as a code owner February 20, 2025 20:14

github-actions bot added the cpp label Feb 20, 2025

rhdong commented Feb 20, 2025

View reviewed changes

rhdong added 7 commits February 26, 2025 09:14

Merge branch 'branch-25.04' into rhdong/logical-merge

6c00e52

Merge branch 'branch-25.04' into rhdong/logical-merge

450df2a

Merge branch 'branch-25.04' into rhdong/logical-merge

4debc02

Merge branch 'branch-25.04' into rhdong/logical-merge

c5ecfe8

Merge branch 'branch-25.04' into rhdong/logical-merge

b3a99c6

Merge branch 'branch-25.04' into rhdong/logical-merge

bcfa656

fix ci fail

186be95

rhdong closed this Mar 13, 2025

rhdong reopened this Mar 13, 2025

rhdong added 2 commits March 17, 2025 15:32

Merge branch 'branch-25.04' into rhdong/logical-merge

6138f92

Merge branch 'branch-25.04' into rhdong/logical-merge

b962830

rhdong added 2 commits March 24, 2025 08:46

Merge branch 'branch-25.04' into rhdong/logical-merge

1e61e1b

Merge branch 'branch-25.04' into rhdong/logical-merge

590c3a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add support of logical merge in Cagra #713

[Feat] Add support of logical merge in Cagra #713

rhdong commented Feb 20, 2025

rhdong Feb 20, 2025 •

edited

Loading

cjnolet Feb 21, 2025

achirkin commented Feb 21, 2025

cjnolet commented Feb 21, 2025 •

edited

Loading

achirkin commented Feb 26, 2025

rhdong commented Mar 13, 2025

rhdong commented Mar 18, 2025

[Feat] Add support of logical merge in Cagra #713

Are you sure you want to change the base?

[Feat] Add support of logical merge in Cagra #713

Conversation

rhdong commented Feb 20, 2025

rhdong Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

cjnolet Feb 21, 2025

Choose a reason for hiding this comment

achirkin commented Feb 21, 2025

cjnolet commented Feb 21, 2025 • edited Loading

achirkin commented Feb 26, 2025

rhdong commented Mar 13, 2025

rhdong commented Mar 18, 2025

rhdong Feb 20, 2025 •

edited

Loading

cjnolet commented Feb 21, 2025 •

edited

Loading