Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(Qdrant): allow payload indexing + on disk vectors #553

Merged

Conversation

lambda-science
Copy link
Contributor

@lambda-science lambda-science commented Mar 6, 2024

This is still a draft:

  • I need to modify tests => Index two documents with two differents payload value. Check that user 1 can't access user 2 doc ? Maybe other test ?

Related doc:

This MR aims to allow Qdrant user to create index on payload (metadata in Haystack naming) field.
This allow to filter way more efficiently vectors when searching.
And also it allows to restricti users to their own vectors only.

I'm open to comments, I'm not sure running it at each start-up of the collection is a good way to do it.
But From Qdrant discord:

andrey.vasnetsov —
overall, we recommed creating payload index before you upload data.
AFAIK, if you try to create same index twice, it won't trigger indexing

https://discord.com/channels/907569970500743200/1214958671742902272/1214958671742902272 (if you are on their discord)

Adtionnally this add also the parameter on_disk that was not included in the integration, to store vectors on disk. See https://qdrant.tech/documentation/concepts/storage/#configuring-memmap-storage

@lambda-science lambda-science requested a review from a team as a code owner March 6, 2024 15:57
@lambda-science lambda-science requested review from davidsbatista and removed request for a team March 6, 2024 15:57
@github-actions github-actions bot added type:documentation Improvements or additions to documentation integration:qdrant labels Mar 6, 2024
@lambda-science lambda-science marked this pull request as draft March 6, 2024 16:01
@lambda-science lambda-science marked this pull request as ready for review March 7, 2024 08:20
@lambda-science
Copy link
Contributor Author

I guess the PR is ready.
For adding test specific for the function I'm adding I'm not sure how to do, as other function were not that much tested also ? hm

Should we ping Qdrant people ? Because I read somewhere that they are the one maintaining this ? https://qdrant.tech/documentation/frameworks/haystack/

@lambda-science lambda-science changed the title feat(Qdrant): allow payload indexing feat(Qdrant): allow payload indexing + on disk vectors Mar 7, 2024
@davidsbatista
Copy link
Contributor

Hi @lambda-science

It was someone from Qdrant who originally wrote the code, so let's ask them to review it.

@davidsbatista
Copy link
Contributor

@Anush008 could you review this PR?

Copy link
Contributor

@Anush008 Anush008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you🙏

@davidsbatista davidsbatista merged commit 810ad84 into deepset-ai:main Mar 7, 2024
10 checks passed
@lambda-science lambda-science deleted the feat/qdrant_payload_index_fixed branch March 7, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:qdrant type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants