Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to query CollectionStore and EmbeddingStore models directly in a clean way? #88

Open
darahayes opened this issue Jul 11, 2024 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@darahayes
Copy link

Hello, thanks for this great project I've found it very useful. I have a use case right now where within one application I want to create and manage multiple collections as well as being able to fetch and return details about some collections, e.g. the name and the collection metadata - Essentially my use case is CRUD for collections.

Currently I don't really see any way to do that cleanly other than dropping down to raw SQL queries in my application. Would this be the recommended approach?

I see in the source code in vectorstores.py that there are "private"/unexposed SQLAlchemy models defined for CollectionStore and EmbeddingsStore. Having them exposed would make querying against the tables a lot easier, at least for my particular use case.

I can understand why you might want to keep them private - they might be subject to change and any user code that touches those models potentially breaks. But I think even when the models are not exposed, if there were changes that resulted in the database tables being different, this would still be a breaking change for a lot of apps anyways.

Is exposing those models something you might consider? Or would you recommend going with raw SQL? Would be more than happy to submit a PR. Thanks!

@eyurtsev
Copy link
Collaborator

eyurtsev commented Jul 12, 2024

Hi @darahayes, there's no current way to do this.

This code needs to be refactored to support two things:

  1. Add a control plane (IndexAdmin) that will do exactly what you need it to do.
  2. Create different tables for the actual embeddings (e.g., to support different embedding dimensions)

Here's a stub at the abstraction that's needed: https://github.com/langchain-ai/langchain/pull/23990/files

This would also open up the pathway for being able to apply specific types of indices on the collections and do schema migration down the roads if necessary.

If you're interested in helping out, I can help provide some guidance if needed!

@eyurtsev eyurtsev added the help wanted Extra attention is needed label Jul 12, 2024
@Sachin-Bhat
Copy link

Hey @eyurtsev,

If more information is given I can take this up.

Cheers,
Sachin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants