Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to add custom ids for documents #177

Open
ismajl-ramadani opened this issue Jun 30, 2024 · 2 comments
Open

Ability to add custom ids for documents #177

ismajl-ramadani opened this issue Jun 30, 2024 · 2 comments

Comments

@ismajl-ramadani
Copy link

I want to add custom IDs for documents so that when I do re-indexing for updates, I can reference them with the ID I have in the data source.

My specific case is with the OpenSearch vector store. Right now as a workaround, I'm running a local build of this crate, and I have added a function with the following signature:

async fn add_documents_with_ids(
    &self,
    docs: &[Document],
    opt: &VecStoreOptions,
    ids: &Vec<String>,
) -> Result<Vec<String>, Box<dyn Error>> {

and then I'm zipping the ids together with the docs and vectors

for (doc, (vector, doc_id)) in zip(docs.iter(), zip(vectors.iter(), ids.iter())) {

and finally adding the id to the docs

let operation = json!({"index": {
    "_id": doc_id,
}});

I wanted to ask if someone would be interested in having this feature as well, and if yes, any suggestions from maintainers on how to implement this without having to break the core trait of VectorStore somehow.

Also, Python package has one additional field for the add_docs, called ids and you can see it in this file:
https://github.com/langchain-ai/langchain/blob/29aa9d67506ac07b92d37d58c684ce3c6dc290cd/libs/community/langchain_community/vectorstores/opensearch_vector_search.py#L587

@fgsch
Copy link
Contributor

fgsch commented Jul 7, 2024

I too noticed this while working on something else.

Personally, I'd like to see the ids added as an Option<Vec<String>> or similar to add_documents, but this will break backward compatibility.

@Abraxas-365 @prabirshrestha, any thoughts on this?

@prabirshrestha
Copy link
Collaborator

prabirshrestha commented Jul 16, 2024

I'm good with the breaking change. Feel free to send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants