Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INTPYTHON-330 GraphRAG #66

Merged
merged 35 commits into from
Feb 11, 2025

Conversation

caseyclements
Copy link
Collaborator

Adds MongoDBGraphStore
"""
GraphRAG is a ChatModel that provides responses to semantic queries.
As in Vector RAG, we augment the Chat Model's training data
with relevant information that we collect from documents.

In Vector RAG, one uses an "Embedding" model that converts both
the query, and the potentially relevant documents, into vectors,
which can then be compared, and the most similar supplied to the
Chat Model as context to the query.

In Graph RAG, one uses an Entity-Extraction model that converts both
the query, and the potentially relevant documents, into graphs. These are
composed of nodes that are entities and edges that are relationships.
The idea is that the graph can find connections between entities and
hence answer questions that require more than one connection.

It is also about finding common entities in documents,
combining the attributes found and hence providing richer context than Vector RAG,
especially in certain cases.

When a document is extracted, each entity is represented by a single
MongoDB Document, and relationships are defined in a nested field named
'relationships' that contains list of targets, types, and attributes. This schema allows MongoDB's 
$graphLookup to traverse all edges from an arbitrary number of starting nodes.

When a query is made, the model extracts the entities and relationships from it,
then traverses the graph starting from each of the entities found.
The connected entities and relationships form the context
that is included with the query to the Chat Model.

"""

…it for better UX. Added allowed_entity_types to name extraction from query.
Copy link
Collaborator

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first pass, reviewing everything but the implementation details of the MongoDBGraphStore. I mostly have cosmetic feedback. Excellent work!

from importlib.metadata import version
from typing import Any, Dict, List, Optional, Union

try:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put this behind an if TYPE_CHECKING to avoid a runtime dependency on typing_extensions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok no problem. What's the best practice? How does one tell what must be placed behind TYPE_CHECKING and what doesn't?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that is only needed for typing could be put under TYPE_CHECKING. Anything that would result in a new runtime dep must be put under TYPE_CHECKING.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what's required?

if TYPE_CHECKING:
    Entity: TypeAlias = Dict[str, Any]
    """Represents an Entity in the knowledge graph with specific schema. See .schema"""

GraphRAG is a ChatModel that provides responses to semantic queries
based on a Knowledge Graph that an LLM is used to create.
As in Vector RAG, we augment the Chat Model's training data
with relevant information that we collect from documents.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with relevant information that we collect from documents.
with relevant information that we collect from documents.

In Graph RAG, one uses an "Entity-Extraction" model that converts
text into Entities and their relationships, a Knowledge Graph.
Comparison is done by Graph traversal, finding entities connected
to the query prompts. These are then supplied to the Chat Model as context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to the query prompts. These are then supplied to the Chat Model as context.
to the query prompts. These are then supplied to the Chat Model as context.

libs/langchain-mongodb/langchain_mongodb/graphrag/graph.py Outdated Show resolved Hide resolved
Args:
documents: list of textual documents and associated metadata.
Returns:
List containing metadata on entities inserted and updated, one value for each input document
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List containing metadata on entities inserted and updated, one value for each input document
List containing metadata on entities inserted and updated, one value for each input document.

def find_entity_by_name(self, name: str) -> Optional[Entity]:
"""Utility to get Entity dict from Knowledge Graph / Collection.
Args:
name: _id string to look for
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: _id string to look for
name: _id string to look for.

Args:
name: _id string to look for
Returns:
List of Entity dicts if any match name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
List of Entity dicts if any match name
List of Entity dicts if any match name.

Args:
starting_entities: Traversal begins with documents whose _id fields match these strings.
max_depth: Recursion continues until no more matching documents are found,
or until the operation reaches a recursion depth specified by the maxDepth parameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be indented to render properly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the line that continues after max_depth, I think it needs to be indented further

Args:
query: Prompt before it is augmented by Knowledge Graph.
chat_model: ChatBot. Defaults to entity_extraction_model.
prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt
prompt: Alternative Prompt Template. Defaults to prompts.rag_prompt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The indentation is necessary.

@caseyclements caseyclements marked this pull request as ready for review February 7, 2025 20:18

from langchain_mongodb.graphrag import example_templates, prompts

from .prompts import rag_prompt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please make all of these constants UPPERCASE? That is the convention used by the sql_toolkit and I find it easier to tell what is a constant and what is a user input.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the string names to UPPERCase. By the time, they are imported here, they aren't constants, though. They've been wrapped. It's a good idea. Draws attention to itself.

Args:
starting_entities: Traversal begins with documents whose _id fields match these strings.
max_depth: Recursion continues until no more matching documents are found,
or until the operation reaches a recursion depth specified by the maxDepth parameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the line that continues after max_depth, I think it needs to be indented further

Copy link
Collaborator Author

@caseyclements caseyclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right.

Copy link
Collaborator

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@caseyclements caseyclements merged commit 2df734f into langchain-ai:main Feb 11, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants