-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
partner: Astra DB, add indexing support for Vector Store class #17767
partner: Astra DB, add indexing support for Vector Store class #17767
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@cbornet FYI |
libs/partners/astradb/langchain_astradb/vectorstores/astradb.py
Outdated
Show resolved
Hide resolved
libs/partners/astradb/langchain_astradb/vectorstores/astradb.py
Outdated
Show resolved
Hide resolved
libs/partners/astradb/langchain_astradb/vectorstores/astradb.py
Outdated
Show resolved
Hide resolved
- removed a raise from _validate_create_collection_indexing - rename metadata-indexing init parameters - reformat docstring for _validate_create_collection_indexing - generic error/warning messages for _validate_create_collection_indexing - improved check-for-no-warning logic for indexing integration tests
@efriis the above checks seem to show that integration tests were not performed. Is it something to be expected for a code-related PR on a partner package? |
libs/partners/astradb/tests/integration_tests/test_vectorstores.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…h LangChain package (#18131) **Description** This PR sets the "caller identity" of the Astra DB clients used by the integration plugins (`AstraDBChatMessageHistory`, `AstraDBStore`, `AstraDBByteStore` and, pending #17767 , `AstraDBVectorStore`). In this way, the requests to the Astra DB Data API coming from within LangChain are identified as such (the purpose is anonymous usage stats to best improve the Astra DB service).
Package has moved! Could you reopen here? https://github.com/langchain-ai/langchain-datastax |
Reopened in the new repo - thank you! |
…h LangChain package (langchain-ai#18131) **Description** This PR sets the "caller identity" of the Astra DB clients used by the integration plugins (`AstraDBChatMessageHistory`, `AstraDBStore`, `AstraDBByteStore` and, pending langchain-ai#17767 , `AstraDBVectorStore`). In this way, the requests to the Astra DB Data API coming from within LangChain are identified as such (the purpose is anonymous usage stats to best improve the Astra DB service).
This PR adds support to the "Indexing" options when creating Astra DB collection for fine-grained control over which fields are indexed to be later used as search filters.
The (sensible) default is to index all and only the contents of the "metadata" map (plus the vectors themselves for the ANN search of course).
But one can choose to exclude certain metadata fields from being indexed (e.g. very long unique strings), or conversely specify an allowlist - or even provide a fully custom indexing prescription fed directly to the API.
In case the collection is detected on DB already, an error is raised if the requested indexing options are incompatible. But if a legacy collection (without indexing options altogether) is detected, a warning is raised and the execution can proceed.
Added unit + integration tests about this behaviour.