Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgvector vectorstore produces double-quoting for filters, breaking them #268

Closed
dredozubov opened this issue Nov 27, 2024 · 4 comments
Closed

Comments

@dredozubov
Copy link
Contributor

Describe the bug
pgvector VectorStore implementation incorrectly filters the result, resulting in no matches. Current implementation double-quotes json string values instead of single-quoting them.

To Reproduce
Steps to reproduce the behavior:

  1. Enable query logging in postgres:
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_duration = on;
ALTER SYSTEM SET log_min_duration_statement = 0;

# Reload configuration
SELECT pg_reload_conf();
  1. Run similarity search via pgvector vectorstore
  2. See that queries are rendered this way: WHERE (data.cmetadata ->> 'doc_type') = '"earnings_transcript"' instead of WHERE (data.cmetadata ->> 'doc_type') = "earnings_transcript"
  3. Filtering won't return any result:
db=# WITH filtered_embedding_dims AS MATERIALIZED (
    SELECT
        *
    FROM
        vs_embeddings
    WHERE
        vector_dims(embedding) = '1536'
)
SELECT COUNT(*)
FROM
    filtered_embedding_dims
    JOIN vs_collections ON filtered_embedding_dims.collection_id = vs_collections.uuid
WHERE
    vs_collections.name = 'langchain'
    AND (filtered_embedding_dims.cmetadata ->> 'doc_type') = '"earnings_transcript"' ;
 count
-------
     0
(1 row)

Expected behavior

db=# WITH filtered_embedding_dims AS MATERIALIZED (
    SELECT
        *
    FROM
        vs_embeddings
    WHERE
        vector_dims(embedding) = '1536'
)
SELECT COUNT(*)
FROM
    filtered_embedding_dims
    JOIN vs_collections ON filtered_embedding_dims.collection_id = vs_collections.uuid
WHERE
    vs_collections.name = 'langchain'
    AND (filtered_embedding_dims.cmetadata ->> 'doc_type') = 'earnings_transcript' ;
 count
-------
    26
(1 row)

Desktop (please complete the following information):

  • OS: OS X Ventura 13.6.9
  • Version: langchain-rust = { version = "4.6.0", features = ["postgres"] }
@dredozubov
Copy link
Contributor Author

Made a quick fix there: #269

@dredozubov
Copy link
Contributor Author

@Abraxas-365 Hi! Is this repo still maintained?

@Abraxas-365
Copy link
Owner

@dredozubov yes sorry I was on vacations, I'm running the checks

@dredozubov
Copy link
Contributor Author

@Abraxas-365 Sure thing, I didn't mean to pressure you. Just genuinely interested in the state of the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants