Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partner: Astra DB, add indexing support for Vector Store class #17767

Conversation

hemidactylus
Copy link
Contributor

This PR adds support to the "Indexing" options when creating Astra DB collection for fine-grained control over which fields are indexed to be later used as search filters.

The (sensible) default is to index all and only the contents of the "metadata" map (plus the vectors themselves for the ANN search of course).
But one can choose to exclude certain metadata fields from being indexed (e.g. very long unique strings), or conversely specify an allowlist - or even provide a fully custom indexing prescription fed directly to the API.

In case the collection is detected on DB already, an error is raised if the requested indexing options are incompatible. But if a legacy collection (without indexing options altogether) is detected, a warning is raised and the execution can proceed.

Added unit + integration tests about this behaviour.

@efriis efriis added the partner label Feb 19, 2024
@efriis efriis self-assigned this Feb 19, 2024
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Feb 19, 2024
Copy link

vercel bot commented Feb 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 29, 2024 1:38am

@dosubot dosubot bot added Ɑ: vector store Related to vector store module 🔌: astradb Primarily related to AstraDB integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels Feb 19, 2024
@hemidactylus
Copy link
Contributor Author

@cbornet FYI

- removed a raise from _validate_create_collection_indexing
- rename metadata-indexing init parameters
- reformat docstring for _validate_create_collection_indexing
- generic error/warning messages for _validate_create_collection_indexing
- improved check-for-no-warning logic for indexing integration tests
@hemidactylus
Copy link
Contributor Author

@efriis the above checks seem to show that integration tests were not performed. Is it something to be expected for a code-related PR on a partner package?

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 21, 2024
Copy link
Collaborator

@cbornet cbornet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

eyurtsev pushed a commit that referenced this pull request Feb 28, 2024
…h LangChain package (#18131)

**Description**

This PR sets the "caller identity" of the Astra DB clients used by the
integration plugins (`AstraDBChatMessageHistory`, `AstraDBStore`,
`AstraDBByteStore` and, pending #17767 , `AstraDBVectorStore`). In this
way, the requests to the Astra DB Data API coming from within LangChain
are identified as such (the purpose is anonymous usage stats to best
improve the Astra DB service).
@efriis
Copy link
Member

efriis commented Mar 1, 2024

Package has moved! Could you reopen here? https://github.com/langchain-ai/langchain-datastax

@efriis efriis closed this Mar 1, 2024
@hemidactylus
Copy link
Contributor Author

Reopened in the new repo - thank you!

@hemidactylus hemidactylus deleted the SL-astradb-vectorstore-indexing branch March 6, 2024 15:55
gkorland pushed a commit to FalkorDB/langchain that referenced this pull request Mar 30, 2024
…h LangChain package (langchain-ai#18131)

**Description**

This PR sets the "caller identity" of the Astra DB clients used by the
integration plugins (`AstraDBChatMessageHistory`, `AstraDBStore`,
`AstraDBByteStore` and, pending langchain-ai#17767 , `AstraDBVectorStore`). In this
way, the requests to the Astra DB Data API coming from within LangChain
are identified as such (the purpose is anonymous usage stats to best
improve the Astra DB service).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔌: astradb Primarily related to AstraDB integrations 🤖:improvement Medium size change to existing code to handle new use-cases partner size:XL This PR changes 500-999 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants