Knowledge Graph RAG with Local LLM

This is an <ongoing> personal project aimed to practice building a pipeline to feed a Neo4J database from unstructured data from PDFs containing (fictional) crime reports, and then use a Graph RAG to query the database in natural language.

The pipeline is based on Neo4J - Enhancing the Accuracy of RAG Applications With Knowledge Graphs article.

The GraphRAG is based on the YouTube tutorial Langchain & Neo4j: Query Your Graph Database in Natural Language.

Both parts of the project were adapted to use a locally hosted Neo4J database (Docker) and a locally hosted LLM (Ollama).

Stack: Python, LangChain, Ollama, Neo4J, Docker

To run this project you'll need:

Docker installed and running on your machine (docker-compose.yml file included in the repository).
Ollama installed and running on your machine, and a model downloaded.
A Python environment with the required packages installed. You can install them with pip install -r requirements.txt.
A .env file with the following variables:

NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=neo4j

The pipeline

pipeline.py -> main script to run the pipeline.

It extracts text from PDFs in the files folder.
Sends the text to the local LLM to extract entities and relationships.

To use a I needed to build a custom chat_prompt, as pointed out in this StackOverflow topic.
I chose to also build my own Pydantic class and examples, instead of using the library's default, to align the model to the crime-related theme.

Inserts into the Neo4J database the extracted entities and relationships.

After running the pipeline script, check out the Neo4J database at http://localhost:7474/browser/:

MATCH (n)-[r]->(m)
RETURN n, r, m

You should see all the entities and relationships extracted from the PDFs.

Results using Llama3-8B model:

The Graph RAG

graph_rag.py -> main script to run the Graph RAG Q&A.

It queries the Neo4J database with a natural language question.
It returns the answer in natural language based on the result of the query.

Right now you need to write the questions using the same words as the entities and relationships in the database. I'm working on a way to make the questions more flexible...

Results using Llama3-8B model:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph RAG with Local LLM

The pipeline

The Graph RAG

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
files		files
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
graph_rag.py		graph_rag.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt

rathcoding/knowledge-graph-rag

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph RAG with Local LLM

The pipeline

The Graph RAG

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages