Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chroma[minor]: Add id attribute to Document returned by search results #27366

Merged
merged 35 commits into from
Dec 13, 2024

Conversation

kwei-zhang
Copy link
Contributor

@kwei-zhang kwei-zhang commented Oct 15, 2024

  • Description
    We added the document id when documents are created in function similarity_search_with_score such that enable retrieve documents again from the vector store using the same id.

    Sample code:

Before change

def _results_to_docs_and_scores(results: Any) -> List[Tuple[Document, float]]:
    return [
        # TODO: Chroma can do batch querying,
        # we shouldn't hard code to the 1st result
        (Document(page_content=result[0], metadata=result[1] or {}), result[2])
        for result in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0],
        )
    ]

After change

def _results_to_docs_and_scores(results: Any) -> List[Tuple[Document, float]]:
    return [
        # TODO: Chroma can do batch querying,
        # we shouldn't hard code to the 1st result
        (Document(id =result[0], page_content=result[1], metadata=result[2] or {}), result[3])
        for result in zip(
            results["ids"][0],
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0],
        )
    ]

Test added:

test_vectorstores.py

Copy link

vercel bot commented Oct 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Dec 13, 2024 9:52pm

@kwei-zhang kwei-zhang marked this pull request as ready for review October 15, 2024 18:16
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module labels Oct 15, 2024
@eyurtsev eyurtsev changed the title Partners[enhancement] Attach Document ID with similarity_search_with_score chroma[minor]: Add id to Document returned by search results Oct 15, 2024
@eyurtsev eyurtsev changed the title chroma[minor]: Add id to Document returned by search results chroma[minor]: Add id attribute to Document returned by search results Oct 15, 2024
@eyurtsev eyurtsev self-assigned this Oct 15, 2024
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 19, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Oct 23, 2024
@@ -19,7 +19,8 @@ disallow_untyped_defs = true
"Release Notes" = "https://github.com/langchain-ai/langchain/releases?q=tag%3A%22langchain-chroma%3D%3D0%22&expanded=true"

[tool.poetry.dependencies]
python = ">=3.8.1,<4"
python = ">=3.9,<4"
langchain-standard-tests = {git = "https://github.com/langchain-ai/langchain.git", subdirectory = "libs/standard-tests"}
[[tool.poetry.dependencies.langchain-core]]
version = ">=0.1.40,<0.4"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to bump up compatibility

id on document field was only added more recently

    id: Optional[str] = None
    """An optional identifier for the document.

    Ideally this should be unique across the document collection and formatted
    as a UUID, but this will not be enforced.

    .. versionadded:: 0.2.11
    """

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @eyurtsev, we just bumped up the version for langchain-core to 0.2.11

@kwei-zhang kwei-zhang requested a review from eyurtsev October 24, 2024 19:49
@eyurtsev eyurtsev enabled auto-merge (squash) October 29, 2024 14:28
auto-merge was automatically disabled October 29, 2024 16:56

Head branch was pushed to by a user without write access

@chaunguyenm
Copy link

@eyurtsev Hey there, sorry for the ping. Seems like one of the workflows failed because of poetry.lock so we reverted it to when it was last changed. Please approve again, thank you!

@eyurtsev eyurtsev enabled auto-merge (squash) October 29, 2024 20:21
@kwei-zhang
Copy link
Contributor Author

This looks like an issue importing Standard Tests, will try to fix it tonight

auto-merge was automatically disabled October 30, 2024 00:25

Head branch was pushed to by a user without write access

@kwei-zhang
Copy link
Contributor Author

Hey @eyurtsev , when I run the commands poetry install --with lint,typing, poetry install --with test in master branch and it fails with the same error. It doesn’t seem to be related to our code changes. Could you please look into it or do you have any suggestion on how to fix it?
Thanks for your help!

@kwei-zhang
Copy link
Contributor Author

Hi @eyurtsev , I believe in recent fix they downgrade the test python version to 3.12 and fix some dependencies which solves the issue. The current PR is facing some depencies issue that is hard to fix so I rebase the change in another PR. Would you please take a look? Thank you.

@chaunguyenm
Copy link

Hi @eyurtsev, just an update on this PR, we have passed all the tests. Please review it when you have some time and let us know if there is anything else to be done. Thank you for your work!

@chaunguyenm
Copy link

Hi @efriis, we noticed that one of our checks failed exactly as this push by you. We tried to recreate this error in our local machine with python3.9.20 and poetry1.7.1 but was not able to. If you know what the issue is and how we can resolve it, please let us know. Thank you!

Copy link
Contributor

@mspronesti mspronesti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunguyenm I think it's solved now. I faced the same failure in one of my PR to the chroma package but when they merged it the workflow passed.

@eyurtsev
Copy link
Collaborator

Going to try and merge

@kwei-zhang
Copy link
Contributor Author

Going to try and merge

Hi, it seems the issue with the id function has been addressed in another PR. Feel free to proceed with merging the testing components. Thank you!

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Dec 11, 2024
@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Dec 13, 2024
@eyurtsev eyurtsev enabled auto-merge (squash) December 13, 2024 21:53
@eyurtsev eyurtsev merged commit b909d54 into langchain-ai:master Dec 13, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging. size:S This PR changes 10-29 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants