You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to be able to filter vector search results by records that include a specific item in an array type field. The current filtering operators do not support doing the element-wise inclusion test I need. The supported operators are listed here:
Operator
Meaning/Category
$eq
Equality (==)
$ne
Inequality (!=)
$lt
Less than (<)
$lte
Less than or equal (<=)
$gt
Greater than (>)
$gte
Greater than or equal (>=)
$in
Special Cased (in)
$nin
Special Cased (not in)
$between
Special Cased (between)
$like
Text (like)
$ilike
Text (case-insensitive like)
$and
Logical (and)
$or
Logical (or)
None of the operators from that list does exactly the inclusion test for a list of text elements. The closest is the $like operator which only works if the field is text. It doesn't work for array type because it tries to do an exact match for the entire array rather than looking into the elements inside.
To reproduce this issue you can use this sample code.
First let's define the schema and push some sample data in:
import os
from langchain_openai import AzureOpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_postgres.vectorstores import PGVector
from datetime import datetime, timezone
# Create a sample text file for testing
sample_text = """
This is a sample document to test our Postgres setup.
We'll use this to verify our vector search functionality with filtering.
The text can contain any information you want to search through later.
"""
with open("sample_test.txt", "w") as f:
f.write(sample_text)
# Load and process the document
loader = TextLoader("sample_test.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
# Initialize Azure OpenAI embeddings
embeddings = AzureOpenAIEmbeddings(
azure_deployment="myAzureDeployment", # Replace with your deployment name
model="text-embedding-ada-002", # This is typically the model name
azure_endpoint="https://yourEndpoint.openai.azure.com",
api_key="yourApiKey",
chunk_size=1,
)
connection_string = (
"postgresql+psycopg://postgres:postgres@your_url:5432/test"
)
collection_name = "test_filtering"
vectorstore = PGVector(
embeddings=embeddings,
collection_name=collection_name,
connection=connection_string,
use_jsonb=True,
)
def prepare_metadata(doc):
return {
"source": doc.metadata["source"],
"category": ["A"],
"timestamp": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.%fZ"),
}
content_list = []
metadata_list = []
for doc in docs:
content_list.append(doc.page_content)
metadata_list.append(prepare_metadata(doc))
ids_added = vectorstore.add_texts(content_list, metadata_list)
print(f"Added {len(ids_added)} documents to the vector store.")
Now I use the following code to retrive data from the collection we just ingested:
filter = {"category": {"$like": ["A"]}} # Exact match of the array instead of looking to the elements inside
retriever = vectorstore.as_retriever(search_kwargs={"filter": filter})
retrieved_docs = retriever .invoke("Opensearch", k=5)
print(retrieved_docs) # No results are returned
I have implemented another operator for array element-wise inclusion test which will fix this issue. But, I want to report this issue first for your review before creating a PR.
Thanks for reading.
The text was updated successfully, but these errors were encountered:
I want to be able to filter vector search results by records that include a specific item in an array type field. The current filtering operators do not support doing the element-wise inclusion test I need. The supported operators are listed here:
None of the operators from that list does exactly the inclusion test for a list of text elements. The closest is the
$like
operator which only works if the field is text. It doesn't work for array type because it tries to do an exact match for the entire array rather than looking into the elements inside.To reproduce this issue you can use this sample code.
First let's define the schema and push some sample data in:
Now I use the following code to retrive data from the collection we just ingested:
I have implemented another operator for array element-wise inclusion test which will fix this issue. But, I want to report this issue first for your review before creating a PR.
Thanks for reading.
The text was updated successfully, but these errors were encountered: