INTPYTHON-330 GraphRAG #66

caseyclements · 2025-01-23T18:11:59Z

Adds MongoDBGraphStore
"""
GraphRAG is a ChatModel that provides responses to semantic queries.
As in Vector RAG, we augment the Chat Model's training data
with relevant information that we collect from documents.

In Vector RAG, one uses an "Embedding" model that converts both
the query, and the potentially relevant documents, into vectors,
which can then be compared, and the most similar supplied to the
Chat Model as context to the query.

In Graph RAG, one uses an Entity-Extraction model that converts both
the query, and the potentially relevant documents, into graphs. These are
composed of nodes that are entities and edges that are relationships.
The idea is that the graph can find connections between entities and
hence answer questions that require more than one connection.

It is also about finding common entities in documents,
combining the attributes found and hence providing richer context than Vector RAG,
especially in certain cases.

When a document is extracted, each entity is represented by a single
MongoDB Document, and relationships are defined in a nested field named
'relationships' that contains list of targets, types, and attributes. This schema allows MongoDB's 
$graphLookup to traverse all edges from an arbitrary number of starting nodes.

When a query is made, the model extracts the entities and relationships from it,
then traverses the graph starting from each of the entities found.
The connected entities and relationships form the context
that is included with the query to the Chat Model.

"""

…hain_mongodb/graphrag

…s of all entities in one call to

…icates in traversal

…schema and test.

…ted tests and prompts.

…ovide tenplate for user-provided examples

…it for better UX. Added allowed_entity_types to name extraction from query.

blink1073

I made a first pass, reviewing everything but the implementation details of the MongoDBGraphStore. I mostly have cosmetic feedback. Excellent work!

libs/langchain-mongodb/langchain_mongodb/graphrag/prompts.py

blink1073 · 2025-02-07T15:54:25Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+from importlib.metadata import version
+from typing import Any, Dict, List, Optional, Union
+
+try:


We need to put this behind an if TYPE_CHECKING to avoid a runtime dependency on typing_extensions.

Ok no problem. What's the best practice? How does one tell what must be placed behind TYPE_CHECKING and what doesn't?

Anything that is only needed for typing could be put under TYPE_CHECKING. Anything that would result in a new runtime dep must be put under TYPE_CHECKING.

Is this what's required?

if TYPE_CHECKING: Entity: TypeAlias = Dict[str, Any] """Represents an Entity in the knowledge graph with specific schema. See .schema"""

blink1073 · 2025-02-07T15:55:06Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+    GraphRAG is a ChatModel that provides responses to semantic queries
+    based on a Knowledge Graph that an LLM is used to create.
+    As in Vector RAG, we augment the Chat Model's training data
+    with relevant information that we collect from  documents.


Suggested change

with relevant information that we collect from documents.

with relevant information that we collect from documents.

blink1073 · 2025-02-07T15:55:30Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+    In Graph RAG, one uses an "Entity-Extraction" model that converts
+    text into Entities and their relationships, a Knowledge Graph.
+    Comparison is done by Graph traversal, finding entities connected
+    to the query prompts. These are then supplied to the Chat Model  as context.


Suggested change

to the query prompts. These are then supplied to the Chat Model as context.

to the query prompts. These are then supplied to the Chat Model as context.

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

blink1073 · 2025-02-07T15:59:01Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+        Args:
+            documents: list of textual documents and associated metadata.
+        Returns:
+            List containing metadata on entities inserted and updated, one value for each input document


Suggested change

List containing metadata on entities inserted and updated, one value for each input document

List containing metadata on entities inserted and updated, one value for each input document.

blink1073 · 2025-02-07T15:59:40Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+    def find_entity_by_name(self, name: str) -> Optional[Entity]:
+        """Utility to get Entity dict from Knowledge Graph / Collection.
+        Args:
+            name: _id string to look for


Suggested change

name: _id string to look for

name: _id string to look for.

blink1073 · 2025-02-07T15:59:46Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+        Args:
+            name: _id string to look for
+        Returns:
+            List of Entity dicts if any match name


Suggested change

List of Entity dicts if any match name

List of Entity dicts if any match name.

blink1073 · 2025-02-07T16:00:17Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+        Args:
+            starting_entities: Traversal begins with documents whose _id fields match these strings.
+            max_depth: Recursion continues until no more matching documents are found,
+            or until the operation reaches a recursion depth specified by the maxDepth parameter


does this need to be indented to render properly?

I mean the line that continues after max_depth, I think it needs to be indented further

blink1073 · 2025-02-07T16:00:40Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+        Args:
+            query: Prompt before it is augmented by Knowledge Graph.
+            chat_model: ChatBot. Defaults to entity_extraction_model.
+            prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt


Suggested change

prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt

prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt.

Yes. The indentation is necessary.

Co-authored-by: Steven Silvester <[email protected]>

blink1073 · 2025-02-10T14:55:21Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+
+from langchain_mongodb.graphrag import example_templates, prompts
+
+from .prompts import rag_prompt


Can we please make all of these constants UPPERCASE? That is the convention used by the sql_toolkit and I find it easier to tell what is a constant and what is a user input.

I changed the string names to UPPERCase. By the time, they are imported here, they aren't constants, though. They've been wrapped. It's a good idea. Draws attention to itself.

blink1073 · 2025-02-10T14:58:10Z

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py

+        Args:
+            starting_entities: Traversal begins with documents whose _id fields match these strings.
+            max_depth: Recursion continues until no more matching documents are found,
+            or until the operation reaches a recursion depth specified by the maxDepth parameter


I mean the line that continues after max_depth, I think it needs to be indented further

caseyclements

You right.

blink1073

LGTM!

caseyclements added 18 commits January 8, 2025 13:41

Initial pieces and test of GraphRAH

061dbec

Defined schema and prompts.

77f94ae

Expand tests

f148cdb

Rebased GraphRAG work onto main, migrating to langchain-mongodb/langc…

0509d37

…hain_mongodb/graphrag

Improved performance of graph traversal by following all relationship…

27181bc

…s of all entities in one call to

Merge entities extracted from different documents

f677180

Moved test

9cb9511

add_documents now returns List[BulkWriteResult]

7d68288

Added validation schema

eebf545

Remove duplicate file

5e194f7

Tidying

b733478

Adds Retriever, similarity_search, schema validator, and removes dupl…

2c0c23c

…icates in traversal

Adds ability to add additional examples to prompts. Fixes Validation …

72eae24

…schema and test.

Renamed properties to attributes to reduce schema ambiguity

2b35229

First cut at adding feature to constrain entity and relationship types.

d065db5

Halfway through graphLookup.connectFromField enhancements

68ebd1f

Major refactor of schema and traversal logic. Renamed ID to _id. Upda…

3b5e058

…ted tests and prompts.

Removed completed TODOs

ec54437

caseyclements requested a review from blink1073 January 23, 2025 18:11

caseyclements added 11 commits January 23, 2025 15:28

Fix f-string

7b976bd

Fix pytest.skip when OPENAI_API_KEY not given.

16345a7

Another f-string fix

88ea09e

Merge remote-tracking branch 'upstream/main' into INTPYTHON-330-GraphRAG

de3c7ab

Fallback to typing_extensions for TypeAlias in py3.9

05aa76f

Touch ups to docstrings.

485997b

Changed targets to target_ids. Updated test that added example entity

0d01839

Formatting

09e8c55

Updated MongoDBGraphStore docstring

44bab91

Adds from_connection_string method

d1c0daa

Created example_templates.py which will contain stock examples and pr…

b0c1c55

…ovide tenplate for user-provided examples

Moved Entity extraction example to own module, improved and extended …

9f30632

…it for better UX. Added allowed_entity_types to name extraction from query.

blink1073 requested changes Feb 7, 2025

View reviewed changes

caseyclements and others added 3 commits February 7, 2025 11:21

Fixes typo.

459fc0c

Co-authored-by: Steven Silvester <[email protected]>

Apply periods consistently in docstrings.

114218f

Add TYPE_CHECKING block

a4d5e0b

caseyclements marked this pull request as ready for review February 7, 2025 20:18

caseyclements requested a review from blink1073 February 7, 2025 20:19

blink1073 requested changes Feb 10, 2025

View reviewed changes

caseyclements commented Feb 10, 2025

View reviewed changes

Fixes indentation typo

a0b5a05

caseyclements requested a review from blink1073 February 10, 2025 20:31

Change prompt constants to UPPERCASE

3bbf258

blink1073 approved these changes Feb 11, 2025

View reviewed changes

caseyclements merged commit 2df734f into langchain-ai:main Feb 11, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INTPYTHON-330 GraphRAG #66

INTPYTHON-330 GraphRAG #66

caseyclements commented Jan 23, 2025

blink1073 left a comment

blink1073 Feb 7, 2025

caseyclements Feb 7, 2025

blink1073 Feb 7, 2025

caseyclements Feb 7, 2025

blink1073 Feb 7, 2025

blink1073 Feb 7, 2025

blink1073 Feb 7, 2025

blink1073 Feb 7, 2025

blink1073 Feb 7, 2025

blink1073 Feb 7, 2025

blink1073 Feb 10, 2025

blink1073 Feb 7, 2025

caseyclements Feb 7, 2025

blink1073 Feb 10, 2025

caseyclements Feb 10, 2025

blink1073 Feb 10, 2025

caseyclements left a comment •

edited

Loading

blink1073 left a comment

	with relevant information that we collect from documents.
	with relevant information that we collect from documents.

	to the query prompts. These are then supplied to the Chat Model as context.
	to the query prompts. These are then supplied to the Chat Model as context.

	List containing metadata on entities inserted and updated, one value for each input document
	List containing metadata on entities inserted and updated, one value for each input document.

	List of Entity dicts if any match name
	List of Entity dicts if any match name.

	prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt
	prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt.


		from langchain_mongodb.graphrag import example_templates, prompts

		from .prompts import rag_prompt

INTPYTHON-330 GraphRAG #66

INTPYTHON-330 GraphRAG #66

Conversation

caseyclements commented Jan 23, 2025

blink1073 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caseyclements left a comment • edited Loading

Choose a reason for hiding this comment

blink1073 left a comment

Choose a reason for hiding this comment

caseyclements left a comment •

edited

Loading