Restructure examples folder (neo4j#146)

* Structure proposal * Backup old examples in a specific folder (tmp) * WIP: example folder structure refactoring * ruff * Add result formatter example * LLM examples * MistralAILLM example + doc * Simple KG builder example * Embeder examples * Weaviate example * Fix import for cohere embeddings * Format * Update README with links to new files * Move Pinecone examples * Can't remove this file yet - but remove link to this specific file from doc - need to keep the file until the next release but then remove * Pinecone + cleaning * Cleaning 'old' folder * Components examples * Test and harmonize retriever section * Deal with qdrant examples - add custom component * Nicer path definition * Mypy/ruff * Rename answer -> QA + add links * Use pre_filters variable for explicitness * ruff * ruff * Missing files for db operations * Fix openai example * Fix CI * :'(
alexthomas93 · Oct 21, 2024 · ca86f26 · ca86f26
1 parent 47312c0
commit ca86f26
Show file tree

Hide file tree

Showing 87 changed files with 3,299 additions and 888 deletions.
diff --git a/docs/source/user_guide_kg_builder.rst b/docs/source/user_guide_kg_builder.rst
@@ -33,7 +33,7 @@ A Knowledge Graph (KG) construction pipeline requires a few components:
 This package contains the interface and implementations for each of these components, which are detailed in the following sections.
 
 To see an end-to-end example of a Knowledge Graph construction pipeline,
-refer to `this example <https://github.com/neo4j/neo4j-graphrag-python/blob/main/examples/pipeline/kg_builder.py>`_.
+refer to the `example folder <https://github.com/neo4j/neo4j-graphrag-python/blob/main/examples/>`_ in the project GitHub repository.
 
 **********************************
 Knowledge Graph Builder Components

diff --git a/docs/source/user_guide_rag.rst b/docs/source/user_guide_rag.rst
@@ -78,6 +78,7 @@ If OpenAI cannot be used directly, there are a few available alternatives:
 - Use Azure OpenAI (GPT...).
 - Use Google VertexAI (Gemini...).
 - Use Anthropic LLM (Claude...).
+- Use Mistral LLM
 - Use Cohere.
 - Use a local Ollama model.
 - Implement a custom interface.
@@ -164,6 +165,31 @@ To use Anthropic, instantiate the `AnthropicLLM` class:
 See :ref:`anthropicllm`.
 
 
+Using MistralAI LLM
+-------------------
+
+To use MistralAI, instantiate the `MistralAILLM` class:
+
+.. code:: python
+
+    from neo4j_graphrag.llm import MistralAILLM
+
+    llm = MistralAILLM(
+        model_name="mistral-small-latest",
+        api_key=api_key,  # can also set `MISTRAL_API_KEY` in env vars
+    )
+    llm.invoke("say something")
+
+
+.. note::
+
+    In order to run this code, the `mistralai` Python package needs to be installed:
+    `pip install mistralai`
+
+See :ref:`mistralaillm`.
+
+
+
 Using Cohere LLM
 ----------------
 

diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,132 @@
+# Examples Index
+
+This folder contains examples usage for the different features
+supported by the `neo4j-graphrag` package:
+
+- [Build Knowledge Graph](#build-knowledge-graph) from PDF or text
+- [Retrieve](#retrieve) information from the graph
+- [Question Answering](#answer-graphrag) (Q&A)
+
+Each of these steps have many customization options which
+are listed in [the last section of this file](#customize).
+
+## Build Knowledge Graph
+
+- [End to end PDF to graph simple pipeline](build_graph/simple_kg_builder_from_pdf.py)
+- [End to end text to graph simple pipeline](build_graph/simple_kg_builder_from_text.py)
+
+
+## Retrieve
+
+- [Retriever from an embedding vector](retrieve/similarity_search_for_vector.py)
+- [Retriever from a text](retrieve/similarity_search_for_text.py)
+- [Graph-based retrieval with VectorCypherRetriever](retrieve/vector_cypher_retriever.py)
+- [Hybrid retriever](./retrieve/hybrid_retriever.py)
+- [Hybrid Cypher retriever](./retrieve/hybrid_cypher_retriever.py)
+- [Text2Cypher retriever](./retrieve/text2cypher_search.py)
+
+
+### External Retrievers
+
+#### Weaviate
+
+- [Vector search](customize/retrievers/external/weaviate/weaviate_vector_search.py)
+- [Text search with local embeder](customize/retrievers/external/weaviate/weaviate_text_search_local_embedder.py)
+- [Text search with remote embeder](customize/retrievers/external/weaviate/weaviate_text_search_remote_embedder.py)
+
+#### Pinecone
+
+- [Vector search](./customize/retrievers/external/pinecone/pinecone_vector_search.py)
+- [Text search](./customize/retrievers/external/pinecone/pinecone_text_search.py)
+
+
+### Qdrant
+
+- [Vector search](./customize/retrievers/external/qdrant/qdrant_vector_search.py)
+- [Text search](./customize/retrievers/external/qdrant/qdrant_text_search.py)
+
+
+## Answer: GraphRAG
+
+- [End to end GraphRAG](./answer/graphrag.py)
+
+
+## Customize
+
+### Retriever
+
+- [Control result format for VectorRetriever](customize/retrievers/result_formatter_vector_retriever.py)
+- [Control result format for VectorCypherRetriever](customize/retrievers/result_formatter_vector_cypher_retriever.py)
+
+
+### LLMs
+
+- [OpenAI (GPT)](./customize/llms/openai_llm.py)
+- [Azure OpenAI]()
+- [VertexAI (Gemini)](./customize/llms/vertexai_llm.py)
+- [MistralAI](./customize/llms/mistalai_llm.py)
+- [Cohere](./customize/llms/cohere_llm.py)
+- [Anthropic (Claude)](./customize/llms/anthropic_llm.py)
+- [Ollama]()
+- [Custom LLM](./customize/llms/custom_llm.py)
+
+
+### Prompts
+
+- [Using a custom prompt](old/graphrag_custom_prompt.py)
+
+
+### Embedders
+
+- [OpenAI](./customize/embeddings/openai_embeddings.py)
+- [Azure OpenAI](./customize/embeddings/azure_openai_embeddings.py)
+- [VertexAI](./customize/embeddings/vertexai_embeddings.py)
+- [MistralAI](./customize/embeddings/mistalai_embeddings.py)
+- [Cohere](./customize/embeddings/cohere_embeddings.py)
+- [Ollama](./customize/embeddings/ollama_embeddings.py)
+- [Custom LLM](./customize/embeddings/custom_embeddings.py)
+
+
+### KG Construction - Pipeline
+
+- [End to end example with explicit components and text input](./customize/build_graph/pipeline/kg_builder_from_text.py)
+- [End to end example with explicit components and PDF input](./customize/build_graph/pipeline/kg_builder_from_pdf.py)
+
+#### Components
+
+- Loaders:
+  - [Load PDF file](./customize/build_graph/components/loaders/pdf_loader.py)
+  - [Custom](./customize/build_graph/components/loaders/custom_loader.py)
+- Text Splitter:
+  - [Fixed size splitter](./customize/build_graph/components/splitters/fixed_size_splitter.py)
+  - [Splitter from LangChain](./customize/build_graph/components/splitters/langhchain_splitter.py)
+  - [Splitter from LLamaIndex](./customize/build_graph/components/splitters/llamaindex_splitter.py)
+  - [Custom](./customize/build_graph/components/splitters/custom_splitter.py)
+- [Chunk embedder]()
+- Schema Builder:
+  - [User-defined](./customize/build_graph/components/schema_builders/schema.py)
+- Entity Relation Extractor:
+  - [LLM-based](./customize/build_graph/components/extractors/llm_entity_relation_extractor.py)
+  - [LLM-based with custom prompt](./customize/build_graph/components/extractors/llm_entity_relation_extractor_with_custom_prompt.py)
+  - [Custom](./customize/build_graph/components/extractors/custom_extractor.py)
+- Knowledge Graph Writer:
+  - [Neo4j writer](./customize/build_graph/components/writers/neo4j_writer.py)
+  - [Custom](./customize/build_graph/components/writers/custom_writer.py)
+- Entity Resolver:
+  - [SinglePropertyExactMatchResolver](./customize/build_graph/components/resolvers/simple_entity_resolver.py)
+  - [SinglePropertyExactMatchResolver with pre-filter](./customize/build_graph/components/resolvers/simple_entity_resolver_pre_filter.py)
+  - [Custom resolver](./customize/build_graph/components/resolvers/custom_resolver.py)
+- [Custom component](./customize/build_graph/components/custom_component.py)
+
+
+### Answer: GraphRAG
+
+- [LangChain compatibility](./customize/answer/langchain_compatiblity.py)
+- [Use a custom prompt](./customize/answer/custom_prompt.py)
+
+
+## Database Operations
+
+- [Create vector index](database_operations/create_vector_index.py)
+- [Create full text index](create_fulltext_index.py)
+- [Populate vector index](populate_vector_index.py)
diff --git a/examples/build_graph/simple_kg_builder_from_pdf.py b/examples/build_graph/simple_kg_builder_from_pdf.py
@@ -0,0 +1,73 @@
+"""This example illustrates how to get started easily with the SimpleKGPipeline
+and ingest PDF into a Neo4j Knowledge Graph.
+
+This example assumes a Neo4j db is up and running. Update the credentials below
+if needed.
+
+OPENAI_API_KEY needs to be in the env vars.
+"""
+
+import asyncio
+from pathlib import Path
+
+import neo4j
+from neo4j_graphrag.embeddings import OpenAIEmbeddings
+from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
+from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
+from neo4j_graphrag.llm import LLMInterface
+from neo4j_graphrag.llm.openai_llm import OpenAILLM
+
+# Neo4j db infos
+URI = "neo4j://localhost:7687"
+AUTH = ("neo4j", "password")
+DATABASE = "neo4j"
+
+
+root_dir = Path(__file__).parents[4]
+file_path = root_dir / "data" / "Harry Potter and the Chamber of Secrets Summary.pdf"
+
+
+# Instantiate Entity and Relation objects. This defines the
+# entities and relations the LLM will be looking for in the text.
+ENTITIES = ["Person", "Organization", "Location"]
+RELATIONS = ["SITUATED_AT", "INTERACTS", "LED_BY"]
+POTENTIAL_SCHEMA = [
+    ("Person", "SITUATED_AT", "Location"),
+    ("Person", "INTERACTS", "Person"),
+    ("Organization", "LED_BY", "Person"),
+]
+
+
+async def define_and_run_pipeline(
+    neo4j_driver: neo4j.Driver,
+    llm: LLMInterface,
+) -> PipelineResult:
+    # Create an instance of the SimpleKGPipeline
+    kg_builder = SimpleKGPipeline(
+        llm=llm,
+        driver=neo4j_driver,
+        embedder=OpenAIEmbeddings(),
+        entities=ENTITIES,
+        relations=RELATIONS,
+        potential_schema=POTENTIAL_SCHEMA,
+    )
+    return await kg_builder.run_async(file_path=str(file_path))
+
+
+async def main() -> PipelineResult:
+    llm = OpenAILLM(
+        model_name="gpt-4o",
+        model_params={
+            "max_tokens": 2000,
+            "response_format": {"type": "json_object"},
+        },
+    )
+    with neo4j.GraphDatabase.driver(URI, auth=AUTH, database=DATABASE) as driver:
+        res = await define_and_run_pipeline(driver, llm)
+    await llm.async_client.close()
+    return res
+
+
+if __name__ == "__main__":
+    res = asyncio.run(main())
+    print(res)
diff --git a/examples/build_graph/simple_kg_builder_from_text.py b/examples/build_graph/simple_kg_builder_from_text.py
@@ -0,0 +1,70 @@
+"""This example illustrates how to get started easily with the SimpleKGPipeline
+and ingest text into a Neo4j Knowledge Graph.
+
+This example assumes a Neo4j db is up and running. Update the credentials below
+if needed.
+"""
+
+import asyncio
+
+import neo4j
+from neo4j_graphrag.embeddings import OpenAIEmbeddings
+from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
+from neo4j_graphrag.experimental.pipeline.pipeline import PipelineResult
+from neo4j_graphrag.llm import LLMInterface
+from neo4j_graphrag.llm.openai_llm import OpenAILLM
+
+# Neo4j db infos
+URI = "neo4j://localhost:7687"
+AUTH = ("neo4j", "password")
+DATABASE = "neo4j"
+
+# Text to process
+TEXT = """The son of Duke Leto Atreides and the Lady Jessica, Paul is the heir of House Atreides,
+an aristocratic family that rules the planet Caladan."""
+
+# Instantiate Entity and Relation objects. This defines the
+# entities and relations the LLM will be looking for in the text.
+ENTITIES = ["Person", "House", "Planet"]
+RELATIONS = ["PARENT_OF", "HEIR_OF", "RULES"]
+POTENTIAL_SCHEMA = [
+    ("Person", "PARENT_OF", "Person"),
+    ("Person", "HEIR_OF", "House"),
+    ("House", "RULES", "Planet"),
+]
+
+
+async def define_and_run_pipeline(
+    neo4j_driver: neo4j.Driver,
+    llm: LLMInterface,
+) -> PipelineResult:
+    # Create an instance of the SimpleKGPipeline
+    kg_builder = SimpleKGPipeline(
+        llm=llm,
+        driver=neo4j_driver,
+        embedder=OpenAIEmbeddings(),
+        entities=ENTITIES,
+        relations=RELATIONS,
+        potential_schema=POTENTIAL_SCHEMA,
+        from_pdf=False,
+    )
+    return await kg_builder.run_async(text=TEXT)
+
+
+async def main() -> PipelineResult:
+    llm = OpenAILLM(
+        model_name="gpt-4o",
+        model_params={
+            "max_tokens": 2000,
+            "response_format": {"type": "json_object"},
+        },
+    )
+    with neo4j.GraphDatabase.driver(URI, auth=AUTH, database=DATABASE) as driver:
+        res = await define_and_run_pipeline(driver, llm)
+    await llm.async_client.close()
+    return res
+
+
+if __name__ == "__main__":
+    res = asyncio.run(main())
+    print(res)
diff --git a/examples/graphrag_custom_prompt.py → examples/customize/answer/custom_prompt.py b/examples/graphrag_custom_prompt.py → examples/customize/answer/custom_prompt.py
@@ -8,31 +8,18 @@
 - Logging configuration
 """
 
-import logging
-
 import neo4j
 from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
 from neo4j_graphrag.generation import GraphRAG, RagTemplate
 from neo4j_graphrag.llm import OpenAILLM
 from neo4j_graphrag.retrievers import VectorCypherRetriever
-from neo4j_graphrag.types import RetrieverResultItem
 
 URI = "neo4j://localhost:7687"
 AUTH = ("neo4j", "password")
 DATABASE = "neo4j"
 INDEX = "moviePlotsEmbedding"
 
 
-# setup logger config
-logger = logging.getLogger("neo4j_graphrag")
-logging.basicConfig(format="%(asctime)s - %(message)s")
-logger.setLevel(logging.DEBUG)
-
-
-def formatter(record: neo4j.Record) -> RetrieverResultItem:
-    return RetrieverResultItem(content=f'{record.get("title")}: {record.get("plot")}')
-
-
 driver = neo4j.GraphDatabase.driver(
     URI,
     auth=AUTH,
@@ -44,8 +31,7 @@ def formatter(record: neo4j.Record) -> RetrieverResultItem:
 retriever = VectorCypherRetriever(
     driver,
     index_name=INDEX,
-    retrieval_query="with node, score return node.title as title, node.plot as plot",
-    result_formatter=formatter,
+    retrieval_query="WITH node, score RETURN node.title as title, node.plot as plot",
     embedder=embedder,
 )