-
Notifications
You must be signed in to change notification settings - Fork 16k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble Accessing Document ID in WeaviateHybridSearchRetriever Results #13238
Trouble Accessing Document ID in WeaviateHybridSearchRetriever Results #13238
Comments
🤖 Hi @blockfer-rp, Great to see you again! I appreciate your detailed issue report. I'm currently looking into it and will get back to you with a thorough response shortly. |
🤖 The issue you're experiencing seems to be related to the In the To resolve this issue, you need to add the @root_validator(pre=True)
def validate_client(
cls,
values: Dict[str, Any],
) -> Dict[str, Any]:
# existing code...
# add _id to the attributes list
cast(List, values["attributes"]).extend([values["text_key"], "_id"])
# existing code... This modification will ensure that the You can find more details about the Please try this solution and let me know if it resolves your issue or if you need further assistance. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
I appreciate your earlier guidance on including the _id attribute in the WeaviateHybridSearchRetriever. Following your suggestion, I implemented a subclass, CustomWeaviateHybridSearchRetriever, to append _id to the attributes list. However, I encountered an error indicating that _id is not a recognized attribute in the query. Error
I've attached the updated implementation of my subclass for your reference. Could you please advise on how to resolve this issue or suggest an alternative approach to include the _id attribute in the search results?
I'm quite surprised that there isn't a more straightforward way to retrieve the Weaviate document_id from search results, akin to how the document type is always returned. I'm wondering if I might be overlooking a simpler solution or an essential step in the process. |
I successfully implemented a solution for retrieving the document ID from Weaviate search results by creating a subclass of WeaviateHybridSearchRetriever. This subclass overrides the _get_relevant_documents method. The key modification in this method is the inclusion of "id" in the query_obj.with_additional call whenever the score parameter is set to True.
While this implementation works as intended and solves the problem at hand, I am still exploring if there's a more straightforward or less invasive method to achieve this. Overriding an entire class to modify a single method seems somewhat excessive for what appears to be a relatively simple requirement. Ideally, a more direct way to include the document ID in the search results, without the need for subclassing, would be preferable. I am open to suggestions and hoping for an easier solution that aligns with best practices and maintains the integrity and simplicity of the code. |
@blockfer-rp fyi, this is solved in the new integration. See langchain-ai/langchain-weaviate#87 |
System Info
langchain: 0.0.334
python: 3.11.6
weaviate-client: 3.25.3
Who can help?
No response
Information
Related Components
Reproduction
I am trying to implement the WeaviateHybridSearchRetriever to retrieve documents from Weaviate. My schema indicates the document ID is stored in the _id field based on the shardingConfig.
When setting up the retriever, I included _id in the attributes list:
However, when I try to access _id on the returned Document objects, I get an error that _id is not found.
For example:
I have tried variations like id, document_id instead of _id but still cannot seem to access the document ID field.
Any suggestions on what I am missing or doing wrong when trying to retrieve the document ID from the Weaviate results using the _id field specified in the schema?
Let me know if any other details would be helpful in troubleshooting this issue!
Schema Details
Example Document
Example Search Result
App Code
Expected behavior
Expected Behavior
When using the WeaviateHybridSearchRetriever for document retrieval, I expect that including the _id attribute in the attributes list will allow me to access the document ID of each retrieved document without any issues. Specifically, after setting up the WeaviateHybridSearchRetriever like so:
I anticipate that executing a query and attempting to print the _id of the first result should successfully return the unique identifier of the document, as per the below code snippet:
In this scenario, my expectation is that the _id field, being specified in the attributes parameter, should be readily accessible in each Document object returned by the get_relevant_documents method. This behavior is crucial for my application as it relies on the unique document IDs for further processing and analysis of the retrieved data.
The text was updated successfully, but these errors were encountered: