Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vector index filtering #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mtrinell
Copy link

@mtrinell mtrinell commented Nov 5, 2024

Extends/Improves GraphCypherQAChain to perform queries based on semantics via vector indexes (listed from auto-generated schema), instead of generating queries with strict/exact matching filters which fail on some use cases.

This means having the following user's query:

chain.invoke("What are the issues that relate to the issue about the 'vote' web app?")

resulting to:

> Entering new GraphCypherQAChain chain...
Generated Cypher:

CALL db.index.vector.queryNodes('issue_description_idx', 5, $search_emb) YIELD node, score
MATCH (node:Issue)-[:DEPENDS_ON]->(related:Issue)
RETURN related.description AS text, score

Generated Filter:
'vote web app'

Full Context:
...

Embeddings are then computed for generated filter and passed as search_emb parameter in queryNodes.

Without this improvement, the same input resulted to the query below, which fails to respond to user's input.

> Entering new GraphCypherQAChain chain...
Generated Cypher:
cypher
MATCH (i:Issue)-[:DEPENDS_ON]->(related:Issue)
WHERE i.description CONTAINS 'vote web app'
RETURN related

Full Context:
[]

Change is backward compatible (selection of prompt and further logic is based on passing an embeddings model (optional)

chain = GraphCypherQAChain.from_llm(
    llm, graph=graph, verbose=True, allow_dangerous_requests=True, embeddings=embeddings
)

Comment on lines +76 to +77
# The pattern to find Cypher code enclosed in triple backticks
pattern = r"###(.*?)###"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the text get enclosed in ###?

Also, I think the comment can be corrected here.

@@ -62,6 +64,23 @@ def extract_cypher(text: str) -> str:
return matches[0] if matches else text


def extract_filter(text: str) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a unit test for this under libs/neo4j/tests/unit_tests/chains/test_graph_qa.py?

@@ -88,6 +107,7 @@ def filter_func(x: str) -> bool:
for r in structured_schema.get("relationships", [])
if all(filter_func(r[t]) for t in ["start", "end", "type"])
],
"vector_indexes": structured_schema.get("vector_indexes", []),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add some test to ensure this field is included in libs/neo4j/tests/unit_tests/chains/test_graph_qa.py ?

Comment on lines +149 to +150
"The vector indexes are the following:",
",".join(formatted_vector_indexes),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, please write some tests to test this under libs/neo4j/tests/unit_tests/chains/test_graph_qa.py

@willtai
Copy link
Collaborator

willtai commented Jan 2, 2025

Sorry for the delay, I've added some comments to clarify the extract_filter function and recommend adding/modifiying unit tests to test the new behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants