Metadata field not properly deserialized when using async_mode=True with PGVector #124

shamspias · 2024-10-08T16:05:01Z

When using the PGVector class with async_mode=True, the metadata field of the Document objects returned from query methods (e.g., asimilarity_search_with_score_by_vector) is not deserialized into a Python dict. Instead, it remains as a Fragment object or another non-dict type. This causes a ValidationError when the Document class expects metadata to be a dictionary.

To Reproduce

Steps to reproduce the behavior:

Initialize PGVector with async_mode=True and use_jsonb=True.
Add documents to the vector store with metadata.
Perform an asynchronous similarity search, e.g., asimilarity_search or asimilarity_search_with_score_by_vector.
Observe that the returned Document objects have metadata fields that are not dictionaries.

Expected behavior

The metadata field of the returned Document objects should be properly deserialized into Python dictionaries, matching the behavior when async_mode=False.

Actual behavior

When async_mode=True, the metadata field is a Fragment object (from asyncpg), leading to errors when the code expects a dict.

Error message

ValidationError: 1 validation error for Document
metadata
  Input should be a valid dictionary [type=dict_type, input_value=Fragment(buf=b'{"user_id": "ahmed"}'), input_type=Fragment]

Environment:

langchain_postgres version: 0.0.12
Python version: 10,11,12
Database: PostgreSQL with pgvector extension
Async driver: asyncpg

Additional context

This issue arises because asyncpg returns JSONB fields as Record or Fragment objects, which are not automatically deserialized into Python dictionaries by SQLAlchemy when using asynchronous sessions.

Code to Reproduce

Ensure that the required connection details like connection_string, collection_name, and embedding_model are securely provided when testing the code.

from langchain_postgres.vectorstores import PGVector

# Setup the connection to PGVector
connection_string = 'your_connection_string_here'
collection_name = 'your_collection_name_here'
embedding_model = 'your_embedding_model_here'

# Initialize PGVector with the necessary parameters
vstore = PGVector(
    connection=connection_string,
    collection_name=collection_name,
    embeddings=embedding_model,
    use_jsonb=True,
    pre_delete_collection=False,
    async_mode=True  # Set to True to reproduce the issue
)

# Add a document with metadata
vstore.add_document({"user_id": "ahmed"}, metadata={"data": "example"})

# Perform an asynchronous similarity search
result = vstore.asimilarity_search_with_score_by_vector()
print(result.metadata)  # The issue: metadata is not returned as a dictionary

Proposed Solution

Modify the _results_to_docs_and_scores method in the PGVector class to ensure that the metadata field is correctly converted into a dictionary before creating the Document objects.

Related Issues:
#118

The text was updated successfully, but these errors were encountered:

simadimonyan · 2024-11-11T18:16:53Z

Did you fix it? I have related issue: langchain-ai/langchain#28029

shamspias · 2024-11-28T14:09:06Z

Did you fix it? I have related issue: langchain-ai/langchain#28029

already fixed in
#125

This was referenced Oct 8, 2024

Fix metadata deserialization in async mode for PGVector #125

Open

feat: add pgvector retriever support langchain-ai/retrieval-agent-template#6

Open

shamspias closed this as completed Nov 28, 2024

shamspias reopened this Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata field not properly deserialized when using async_mode=True with PGVector #124

Metadata field not properly deserialized when using async_mode=True with PGVector #124

shamspias commented Oct 8, 2024

simadimonyan commented Nov 11, 2024

shamspias commented Nov 28, 2024

Metadata field not properly deserialized when using async_mode=True with PGVector #124

Metadata field not properly deserialized when using async_mode=True with PGVector #124

Comments

shamspias commented Oct 8, 2024

simadimonyan commented Nov 11, 2024

shamspias commented Nov 28, 2024