Skip to content

Commit

Permalink
Restructure examples folder (neo4j#146)
Browse files Browse the repository at this point in the history
* Structure proposal

* Backup old examples in a specific folder (tmp)

* WIP: example folder structure refactoring

* ruff

* Add result formatter example

* LLM examples

* MistralAILLM example + doc

* Simple KG builder example

* Embeder examples

* Weaviate example

* Fix import for cohere embeddings

* Format

* Update README with links to new files

* Move Pinecone examples

* Can't remove this file yet - but remove link to this specific file from doc - need to keep the file until the next release but then remove

* Pinecone + cleaning

* Cleaning 'old' folder

* Components examples

* Test and harmonize retriever section

* Deal with qdrant examples - add custom component

* Nicer path definition

* Mypy/ruff

* Rename answer -> QA + add links

* Use pre_filters variable for explicitness

* ruff

* ruff

* Missing files for db operations

* Fix openai example

* Fix CI

* :'(
  • Loading branch information
stellasia authored Oct 21, 2024
1 parent 47312c0 commit ca86f26
Show file tree
Hide file tree
Showing 87 changed files with 3,299 additions and 888 deletions.
2 changes: 1 addition & 1 deletion docs/source/user_guide_kg_builder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ A Knowledge Graph (KG) construction pipeline requires a few components:
This package contains the interface and implementations for each of these components, which are detailed in the following sections.

To see an end-to-end example of a Knowledge Graph construction pipeline,
refer to `this example <https://github.com/neo4j/neo4j-graphrag-python/blob/main/examples/pipeline/kg_builder.py>`_.
refer to the `example folder <https://github.com/neo4j/neo4j-graphrag-python/blob/main/examples/>`_ in the project GitHub repository.

**********************************
Knowledge Graph Builder Components
Expand Down
26 changes: 26 additions & 0 deletions docs/source/user_guide_rag.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ If OpenAI cannot be used directly, there are a few available alternatives:
- Use Azure OpenAI (GPT...).
- Use Google VertexAI (Gemini...).
- Use Anthropic LLM (Claude...).
- Use Mistral LLM
- Use Cohere.
- Use a local Ollama model.
- Implement a custom interface.
Expand Down Expand Up @@ -164,6 +165,31 @@ To use Anthropic, instantiate the `AnthropicLLM` class:
See :ref:`anthropicllm`.


Using MistralAI LLM
-------------------

To use MistralAI, instantiate the `MistralAILLM` class:

.. code:: python
from neo4j_graphrag.llm import MistralAILLM
llm = MistralAILLM(
model_name="mistral-small-latest",
api_key=api_key, # can also set `MISTRAL_API_KEY` in env vars
)
llm.invoke("say something")
.. note::

In order to run this code, the `mistralai` Python package needs to be installed:
`pip install mistralai`

See :ref:`mistralaillm`.



Using Cohere LLM
----------------

Expand Down
132 changes: 132 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Examples Index

This folder contains examples usage for the different features
supported by the `neo4j-graphrag` package:

- [Build Knowledge Graph](#build-knowledge-graph) from PDF or text
- [Retrieve](#retrieve) information from the graph
- [Question Answering](#answer-graphrag) (Q&A)

Each of these steps have many customization options which
are listed in [the last section of this file](#customize).

## Build Knowledge Graph

- [End to end PDF to graph simple pipeline](build_graph/simple_kg_builder_from_pdf.py)
- [End to end text to graph simple pipeline](build_graph/simple_kg_builder_from_text.py)


## Retrieve

- [Retriever from an embedding vector](retrieve/similarity_search_for_vector.py)
- [Retriever from a text](retrieve/similarity_search_for_text.py)
- [Graph-based retrieval with VectorCypherRetriever](retrieve/vector_cypher_retriever.py)
- [Hybrid retriever](./retrieve/hybrid_retriever.py)
- [Hybrid Cypher retriever](./retrieve/hybrid_cypher_retriever.py)
- [Text2Cypher retriever](./retrieve/text2cypher_search.py)


### External Retrievers

#### Weaviate

- [Vector search](customize/retrievers/external/weaviate/weaviate_vector_search.py)
- [Text search with local embeder](customize/retrievers/external/weaviate/weaviate_text_search_local_embedder.py)
- [Text search with remote embeder](customize/retrievers/external/weaviate/weaviate_text_search_remote_embedder.py)

#### Pinecone

- [Vector search](./customize/retrievers/external/pinecone/pinecone_vector_search.py)
- [Text search](./customize/retrievers/external/pinecone/pinecone_text_search.py)


### Qdrant

- [Vector search](./customize/retrievers/external/qdrant/qdrant_vector_search.py)
- [Text search](./customize/retrievers/external/qdrant/qdrant_text_search.py)


## Answer: GraphRAG

- [End to end GraphRAG](./answer/graphrag.py)


## Customize

### Retriever

- [Control result format for VectorRetriever](customize/retrievers/result_formatter_vector_retriever.py)
- [Control result format for VectorCypherRetriever](customize/retrievers/result_formatter_vector_cypher_retriever.py)


### LLMs

- [OpenAI (GPT)](./customize/llms/openai_llm.py)
- [Azure OpenAI]()
- [VertexAI (Gemini)](./customize/llms/vertexai_llm.py)
- [MistralAI](./customize/llms/mistalai_llm.py)
- [Cohere](./customize/llms/cohere_llm.py)
- [Anthropic (Claude)](./customize/llms/anthropic_llm.py)
- [Ollama]()
- [Custom LLM](./customize/llms/custom_llm.py)


### Prompts

- [Using a custom prompt](old/graphrag_custom_prompt.py)


### Embedders

- [OpenAI](./customize/embeddings/openai_embeddings.py)
- [Azure OpenAI](./customize/embeddings/azure_openai_embeddings.py)
- [VertexAI](./customize/embeddings/vertexai_embeddings.py)
- [MistralAI](./customize/embeddings/mistalai_embeddings.py)
- [Cohere](./customize/embeddings/cohere_embeddings.py)
- [Ollama](./customize/embeddings/ollama_embeddings.py)
- [Custom LLM](./customize/embeddings/custom_embeddings.py)


### KG Construction - Pipeline

- [End to end example with explicit components and text input](./customize/build_graph/pipeline/kg_builder_from_text.py)
- [End to end example with explicit components and PDF input](./customize/build_graph/pipeline/kg_builder_from_pdf.py)

#### Components

- Loaders:
- [Load PDF file](./customize/build_graph/components/loaders/pdf_loader.py)
- [Custom](./customize/build_graph/components/loaders/custom_loader.py)
- Text Splitter:
- [Fixed size splitter](./customize/build_graph/components/splitters/fixed_size_splitter.py)
- [Splitter from LangChain](./customize/build_graph/components/splitters/langhchain_splitter.py)
- [Splitter from LLamaIndex](./customize/build_graph/components/splitters/llamaindex_splitter.py)
- [Custom](./customize/build_graph/components/splitters/custom_splitter.py)
- [Chunk embedder]()
- Schema Builder:
- [User-defined](./customize/build_graph/components/schema_builders/schema.py)
- Entity Relation Extractor:
- [LLM-based](./customize/build_graph/components/extractors/llm_entity_relation_extractor.py)
- [LLM-based with custom prompt](./customize/build_graph/components/extractors/llm_entity_relation_extractor_with_custom_prompt.py)
- [Custom](./customize/build_graph/components/extractors/custom_extractor.py)
- Knowledge Graph Writer:
- [Neo4j writer](./customize/build_graph/components/writers/neo4j_writer.py)
- [Custom](./customize/build_graph/components/writers/custom_writer.py)
- Entity Resolver:
- [SinglePropertyExactMatchResolver](./customize/build_graph/components/resolvers/simple_entity_resolver.py)
- [SinglePropertyExactMatchResolver with pre-filter](./customize/build_graph/components/resolvers/simple_entity_resolver_pre_filter.py)
- [Custom resolver](./customize/build_graph/components/resolvers/custom_resolver.py)
- [Custom component](./customize/build_graph/components/custom_component.py)


### Answer: GraphRAG

- [LangChain compatibility](./customize/answer/langchain_compatiblity.py)
- [Use a custom prompt](./customize/answer/custom_prompt.py)


## Database Operations

- [Create vector index](database_operations/create_vector_index.py)
- [Create full text index](create_fulltext_index.py)
- [Populate vector index](populate_vector_index.py)
73 changes: 73 additions & 0 deletions examples/build_graph/simple_kg_builder_from_pdf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
"""This example illustrates how to get started easily with the SimpleKGPipeline
and ingest PDF into a Neo4j Knowledge Graph.
This example assumes a Neo4j db is up and running. Update the credentials below
if needed.
OPENAI_API_KEY needs to be in the env vars.
"""

import asyncio
from pathlib import Path

import neo4j
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
from neo4j_graphrag.llm import LLMInterface
from neo4j_graphrag.llm.openai_llm import OpenAILLM

# Neo4j db infos
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"


root_dir = Path(__file__).parents[4]
file_path = root_dir / "data" / "Harry Potter and the Chamber of Secrets Summary.pdf"


# Instantiate Entity and Relation objects. This defines the
# entities and relations the LLM will be looking for in the text.
ENTITIES = ["Person", "Organization", "Location"]
RELATIONS = ["SITUATED_AT", "INTERACTS", "LED_BY"]
POTENTIAL_SCHEMA = [
("Person", "SITUATED_AT", "Location"),
("Person", "INTERACTS", "Person"),
("Organization", "LED_BY", "Person"),
]


async def define_and_run_pipeline(
neo4j_driver: neo4j.Driver,
llm: LLMInterface,
) -> PipelineResult:
# Create an instance of the SimpleKGPipeline
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
embedder=OpenAIEmbeddings(),
entities=ENTITIES,
relations=RELATIONS,
potential_schema=POTENTIAL_SCHEMA,
)
return await kg_builder.run_async(file_path=str(file_path))


async def main() -> PipelineResult:
llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"max_tokens": 2000,
"response_format": {"type": "json_object"},
},
)
with neo4j.GraphDatabase.driver(URI, auth=AUTH, database=DATABASE) as driver:
res = await define_and_run_pipeline(driver, llm)
await llm.async_client.close()
return res


if __name__ == "__main__":
res = asyncio.run(main())
print(res)
70 changes: 70 additions & 0 deletions examples/build_graph/simple_kg_builder_from_text.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
"""This example illustrates how to get started easily with the SimpleKGPipeline
and ingest text into a Neo4j Knowledge Graph.
This example assumes a Neo4j db is up and running. Update the credentials below
if needed.
"""

import asyncio

import neo4j
from neo4j_graphrag.embeddings import OpenAIEmbeddings
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
from neo4j_graphrag.llm import LLMInterface
from neo4j_graphrag.llm.openai_llm import OpenAILLM

# Neo4j db infos
URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"

# Text to process
TEXT = """The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House Atreides,
an aristocratic family that rules the planet Caladan."""

# Instantiate Entity and Relation objects. This defines the
# entities and relations the LLM will be looking for in the text.
ENTITIES = ["Person", "House", "Planet"]
RELATIONS = ["PARENT_OF", "HEIR_OF", "RULES"]
POTENTIAL_SCHEMA = [
("Person", "PARENT_OF", "Person"),
("Person", "HEIR_OF", "House"),
("House", "RULES", "Planet"),
]


async def define_and_run_pipeline(
neo4j_driver: neo4j.Driver,
llm: LLMInterface,
) -> PipelineResult:
# Create an instance of the SimpleKGPipeline
kg_builder = SimpleKGPipeline(
llm=llm,
driver=neo4j_driver,
embedder=OpenAIEmbeddings(),
entities=ENTITIES,
relations=RELATIONS,
potential_schema=POTENTIAL_SCHEMA,
from_pdf=False,
)
return await kg_builder.run_async(text=TEXT)


async def main() -> PipelineResult:
llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"max_tokens": 2000,
"response_format": {"type": "json_object"},
},
)
with neo4j.GraphDatabase.driver(URI, auth=AUTH, database=DATABASE) as driver:
res = await define_and_run_pipeline(driver, llm)
await llm.async_client.close()
return res


if __name__ == "__main__":
res = asyncio.run(main())
print(res)
Original file line number Diff line number Diff line change
Expand Up @@ -8,31 +8,18 @@
- Logging configuration
"""

import logging

import neo4j
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.generation import GraphRAG, RagTemplate
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.types import RetrieverResultItem

URI = "neo4j://localhost:7687"
AUTH = ("neo4j", "password")
DATABASE = "neo4j"
INDEX = "moviePlotsEmbedding"


# setup logger config
logger = logging.getLogger("neo4j_graphrag")
logging.basicConfig(format="%(asctime)s - %(message)s")
logger.setLevel(logging.DEBUG)


def formatter(record: neo4j.Record) -> RetrieverResultItem:
return RetrieverResultItem(content=f'{record.get("title")}: {record.get("plot")}')


driver = neo4j.GraphDatabase.driver(
URI,
auth=AUTH,
Expand All @@ -44,8 +31,7 @@ def formatter(record: neo4j.Record) -> RetrieverResultItem:
retriever = VectorCypherRetriever(
driver,
index_name=INDEX,
retrieval_query="with node, score return node.title as title, node.plot as plot",
result_formatter=formatter,
retrieval_query="WITH node, score RETURN node.title as title, node.plot as plot",
embedder=embedder,
)

Expand Down
Loading

0 comments on commit ca86f26

Please sign in to comment.