Skip to content

Commit

Permalink
Feature/kg builder (neo4j#91)
Browse files Browse the repository at this point in the history
* Pipeline  (neo4j#81)

* First draft of pipeline/component architecture. Example using the RAG pipeline.

* More complex implementation of pipeline to deal with branching and aggregations - no async yet

* Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default

* Test RAG with new Pipeline implementation

* File refactoring

* Pipeline orchestration with async support

* Import sorting

* Pipeline rerun + exception on cyclic graph (for now)

* Mypy

* Python version compat

* Rename process->run for Components for consistency with Pipeline

* Move components test in the example folder - add some tests

* Race condition fix - documentation - ruff

* Fix import sorting

* mypy on tests

* Mark test as async

* Tests were not testing...

* Ability to create Pipeline templates

* Ruff

* Future + header

* Renaming + update import structure to make it more compatible with rest of the repo

* Check input parameters before starting the pipeline

* Introduce output model for component - Validate pipeline before running - More unit tests

* Import..

* Finally installed pre-commit hooks...

* Finally installed pre-commit hooks...

* Finally installed pre-commit hooks... and struggling with pydantic..

* Mypy on examples

* Add missing header

* Update doc

* Fix import in doc

* Update changelog

* Update docs/source/user_guide_pipeline.rst

Co-authored-by: willtai <[email protected]>

* Refactor tests folder to match src structure

* Move exceptions to separate file and rename them to make it clearer they are related to pipeline

* Mypy

* Rename def => config

* Introduce generic type to remove most of the "type: ignore" comments

* Remove unnecessary comment

* Ruff

* Document and test is_cyclic method

* Remove find_all method from store (simplify data retrieval)

* value is not a list anymore (or, if it is, it's on purpose)

* Remove comments, fix example in doc

* Remove core directory - move files to /pipeline

* Expose stores from pipeline subpackage

* Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input

* Component subclasses can return DataModel

* Add note on async + schema to illustrate parameter propagation

---------

Co-authored-by: willtai <[email protected]>

* Pipeline  (neo4j#81)

* First draft of pipeline/component architecture. Example using the RAG pipeline.

* More complex implementation of pipeline to deal with branching and aggregations - no async yet

* Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default

* Test RAG with new Pipeline implementation

* File refactoring

* Pipeline orchestration with async support

* Import sorting

* Pipeline rerun + exception on cyclic graph (for now)

* Mypy

* Python version compat

* Rename process->run for Components for consistency with Pipeline

* Move components test in the example folder - add some tests

* Race condition fix - documentation - ruff

* Fix import sorting

* mypy on tests

* Mark test as async

* Tests were not testing...

* Ability to create Pipeline templates

* Ruff

* Future + header

* Renaming + update import structure to make it more compatible with rest of the repo

* Check input parameters before starting the pipeline

* Introduce output model for component - Validate pipeline before running - More unit tests

* Import..

* Finally installed pre-commit hooks...

* Finally installed pre-commit hooks...

* Finally installed pre-commit hooks... and struggling with pydantic..

* Mypy on examples

* Add missing header

* Update doc

* Fix import in doc

* Update changelog

* Update docs/source/user_guide_pipeline.rst

Co-authored-by: willtai <[email protected]>

* Refactor tests folder to match src structure

* Move exceptions to separate file and rename them to make it clearer they are related to pipeline

* Mypy

* Rename def => config

* Introduce generic type to remove most of the "type: ignore" comments

* Remove unnecessary comment

* Ruff

* Document and test is_cyclic method

* Remove find_all method from store (simplify data retrieval)

* value is not a list anymore (or, if it is, it's on purpose)

* Remove comments, fix example in doc

* Remove core directory - move files to /pipeline

* Expose stores from pipeline subpackage

* Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input

* Component subclasses can return DataModel

* Add note on async + schema to illustrate parameter propagation

---------

Co-authored-by: willtai <[email protected]>

* Adds a Text Splitter (neo4j#82)

* Added text splitter adapter class

* Added copyright header to new files

* Added __future__ import to text_splitters.py for backwards compatibility of type hints

* Moved text splitter file and tests

* Split text splitter adapter into 2 adapters

* Added optional metadata to text chunks

* Fixed typos

* Moved text splitters inside of the components folder

* Fixed Component import

* Added a TextChunkEmbedder (neo4j#87)

* Added a TextChunkEmbedder

* Added the copyright header to test_embedder.py

* Updated test_text_chunk_embedder_run

* Adds a knowledge graph writer (neo4j#83)

* Added copyright header to new files

* Added copyright header to kg_writer.py

* Added __future__ import to kg_writer.py for backwards compatibility of type hints

* Added E2E test for Neo4jWriter

* Added a copyright header to test_kg_builder_e2e.py

* Added upsert_vector test for relationship embeddings

* Moved KG writer and its tests

* Moved Neo4jGraph and associated objects to a new file

* Renamed KG builder fixture

* Added unit tests for KG writer

* Split upsert_vector into 2 functions

* Fixed broken cypher query strings

* Removed embedding creation from Neo4jWriter

* Fixed setup_neo4j_for_kg_construction fixture

* Added KGWriterModel class

* Fixed minor mistake in test_weaviate_e2e.py

* Renamed kg_construction folder to components

* Updated unit tests with new folder structure

* Fixed broken import

* Fixed copyright headers

* Added missing docstrings

* Fixed typo

* Add documentation for pipeline exceptions (neo4j#90)

* Fixes and refactors the KG writer component (neo4j#92)

* Fixes and refactors the KG writer component

* Fixed mypy error

* Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY

* Add schema for kg builder (neo4j#88)

* Add schema for kg builder and tests

* Fixed mypy checks

* Reverted kg builder example with schema

* Revert to List and Dict due to Python3.8 issue with using get_type_hints

* Added properties to Entity and Relation

* Add test for missing properties

* Fix type annotations in test

* Add property types

* Refactored entity, relation, and property types

* Unused import

* Moved schema to components/ (neo4j#96)

* Add entity / Relation extraction component (neo4j#85)

* Pipeline  (neo4j#81)

* First draft of pipeline/component architecture. Example using the RAG pipeline.

* More complex implementation of pipeline to deal with branching and aggregations - no async yet

* Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default

* Test RAG with new Pipeline implementation

* File refactoring

* Pipeline orchestration with async support

* Import sorting

* Pipeline rerun + exception on cyclic graph (for now)

* Mypy

* Python version compat

* Rename process->run for Components for consistency with Pipeline

* Move components test in the example folder - add some tests

* Race condition fix - documentation - ruff

* Fix import sorting

* mypy on tests

* Mark test as async

* Tests were not testing...

* Ability to create Pipeline templates

* Ruff

* Future + header

* Renaming + update import structure to make it more compatible with rest of the repo

* Check input parameters before starting the pipeline

* Introduce output model for component - Validate pipeline before running - More unit tests

* Import..

* Finally installed pre-commit hooks...

* Finally installed pre-commit hooks...

* Finally installed pre-commit hooks... and struggling with pydantic..

* Mypy on examples

* Add missing header

* Update doc

* Fix import in doc

* Update changelog

* Update docs/source/user_guide_pipeline.rst

Co-authored-by: willtai <[email protected]>

* Refactor tests folder to match src structure

* Move exceptions to separate file and rename them to make it clearer they are related to pipeline

* Mypy

* Rename def => config

* Introduce generic type to remove most of the "type: ignore" comments

* Remove unnecessary comment

* Ruff

* Document and test is_cyclic method

* Remove find_all method from store (simplify data retrieval)

* value is not a list anymore (or, if it is, it's on purpose)

* Remove comments, fix example in doc

* Remove core directory - move files to /pipeline

* Expose stores from pipeline subpackage

* Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input

* Component subclasses can return DataModel

* Add note on async + schema to illustrate parameter propagation

---------

Co-authored-by: willtai <[email protected]>

* Entity / Relation extraction component

* Adds a Text Splitter (neo4j#82)

* Added text splitter adapter class

* Added copyright header to new files

* Added __future__ import to text_splitters.py for backwards compatibility of type hints

* Moved text splitter file and tests

* Split text splitter adapter into 2 adapters

* Added optional metadata to text chunks

* Fixed typos

* Moved text splitters inside of the components folder

* Fixed Component import

* Add tests

* Keep it simple: remove deps to jinja for now

* Update example with existing components

* log config in example

* Fix tests

* Rm unused import

* Add copyright headers

* Rm debug code

* Try and fix tests

* Unused import

* get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions

* Return model is also conditioned to the existence of the run method
=> should raise an error if run is not implemented?

* Log when we do not raise exception to keep track of the failure

* Update prompt to match new KGwriter expected type

* Fix test

* Fix type for `examples`

* Use SchemaConfig as input for the ER Extractor component

* The "base" EntityRelationExtractor is an ABC that must be subclassed

* Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp

* Option to build lexical graph in the ERExtractor component

* Fix one test

* Fix some more tests

* Fix some more tests

* Remove "type: ignore" comments

---------

Co-authored-by: willtai <[email protected]>
Co-authored-by: Alex Thomas <[email protected]>

* Update lock file after merge

* Remove pipeline/components folder (again)

* Updated component docs (neo4j#99)

* Updated component docs

* Removed weaviate test update

* Updated pipeline user guide with link to components in the API section

* Feature/kg builder e2e tests (neo4j#98)

* End to end tests for KG builder pipeline

* Adding chunk embedder to the pipeline and e2e tests

* Fix how the chunk embedding is saved

* Fix e2e tests

* Fix mypy

* mypy stuff :'(

* WIP: update e2e tests

* Check counts also here

* Enable e2e tests on this PR only

* Fix e2e tests (was not mocking the correct method for Embedder)

* Revert CI to normal

* Updated CHANGLOG and set max-parallel: 1 for E2E tests in pr-e2e-tests.yaml

---------

Co-authored-by: willtai <[email protected]>
Co-authored-by: Alex Thomas <[email protected]>
Co-authored-by: willtai <[email protected]>
  • Loading branch information
4 people authored Aug 13, 2024
1 parent 242c77c commit cc48eef
Show file tree
Hide file tree
Showing 57 changed files with 6,498 additions and 1,204 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/pr-e2e-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ concurrency:
jobs:
e2e-tests:
runs-on: ubuntu-latest
strategy:
max-parallel: 1
strategy:
matrix:
python-version: ['3.8', '3.12']
Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@

### Added
- Add optional custom_prompt arg to the Text2CypherRetriever class.
- Introduced support for Component/Pipeline flexible architecture.
- Added new components for knowledge graph construction, including text splitters, schema builders, entity-relation extractors, and Neo4j writers.
- Implemented end-to-end tests for the new knowledge graph builder pipeline.

### Changed
- `GraphRAG.search` method first parameter has been renamed `query_text` (was `query`) for consistency with the retrievers interface.
Expand Down
111 changes: 100 additions & 11 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,74 @@
API Documentation
#################

.. _components-section:

**********
Components
**********

KGWriter
========

.. autoclass:: neo4j_genai.components.kg_writer.KGWriter
:members: run

Neo4jWriter
===========

.. autoclass:: neo4j_genai.components.kg_writer.Neo4jWriter
:members: run

TextSplitter
============

.. autoclass:: neo4j_genai.components.text_splitters.base.TextSplitter
:members: run

LangChainTextSplitterAdapter
============================

.. autoclass:: neo4j_genai.components.text_splitters.langchain.LangChainTextSplitterAdapter
:members: run

LlamaIndexTextSplitterAdapter
=============================

.. autoclass:: neo4j_genai.components.text_splitters.llamaindex.LlamaIndexTextSplitterAdapter
:members: run

TextChunkEmbedder
=================

.. autoclass:: neo4j_genai.components.embedder.TextChunkEmbedder
:members: run

SchemaBuilder
=============

.. autoclass:: neo4j_genai.components.schema.SchemaBuilder
:members: run

EntityRelationExtractor
=======================

.. autoclass:: neo4j_genai.components.entity_relation_extractor.EntityRelationExtractor
:members: run

LLMEntityRelationExtractor
==========================

.. autoclass:: neo4j_genai.components.entity_relation_extractor.LLMEntityRelationExtractor
:members: run

.. _retrievers-section:

**********
Retrievers
**********

RetrieverInterface
===================
==================

.. autoclass:: neo4j_genai.retrievers.base.Retriever
:members:
Expand Down Expand Up @@ -70,39 +130,39 @@ PineconeNeo4jRetriever
:members: search


**********
********
Embedder
**********
********

.. autoclass:: neo4j_genai.embedder.Embedder
:members:

SentenceTransformerEmbeddings
================================

.. autoclass:: neo4j_genai.embeddings.SentenceTransformerEmbeddings
.. autoclass:: neo4j_genai.embeddings.sentence_transformers.SentenceTransformerEmbeddings
:members:

**********
Generation
**********

LLMInterface
======================
============

.. autoclass:: neo4j_genai.llm.LLMInterface
:members:


OpenAILLM
======================
=========

.. autoclass:: neo4j_genai.llm.OpenAILLM
:members:


PromptTemplate
======================
==============

.. autoclass:: neo4j_genai.generation.prompts.PromptTemplate
:members:
Expand All @@ -125,6 +185,8 @@ Database Interaction

.. autofunction:: neo4j_genai.indexes.upsert_vector

.. autofunction:: neo4j_genai.indexes.upsert_vector_on_relationship


******
Errors
Expand Down Expand Up @@ -157,6 +219,12 @@ Errors

* :class:`neo4j_genai.exceptions.LLMGenerationError`

* :class:`neo4j_genai.pipeline.exceptions.PipelineDefinitionError`

* :class:`neo4j_genai.pipeline.exceptions.PipelineMissingDependencyError`

* :class:`neo4j_genai.pipeline.exceptions.PipelineStatusUpdateError`


Neo4jGenAiError
===============
Expand Down Expand Up @@ -222,7 +290,7 @@ Neo4jVersionError


Text2CypherRetrievalError
==========================
=========================

.. autoclass:: neo4j_genai.exceptions.Text2CypherRetrievalError
:show-inheritance:
Expand All @@ -236,21 +304,42 @@ SchemaFetchError


RagInitializationError
==========================
======================

.. autoclass:: neo4j_genai.exceptions.RagInitializationError
:show-inheritance:


PromptMissingInputError
==========================
=======================

.. autoclass:: neo4j_genai.exceptions.PromptMissingInputError
:show-inheritance:


LLMGenerationError
==========================
==================

.. autoclass:: neo4j_genai.exceptions.LLMGenerationError
:show-inheritance:


PipelineDefinitionError
=======================

.. autoclass:: neo4j_genai.pipeline.exceptions.PipelineDefinitionError
:show-inheritance:


PipelineMissingDependencyError
==============================

.. autoclass:: neo4j_genai.pipeline.exceptions.PipelineMissingDependencyError
:show-inheritance:


PipelineStatusUpdateError
=========================

.. autoclass:: neo4j_genai.pipeline.exceptions.PipelineStatusUpdateError
:show-inheritance:
6 changes: 4 additions & 2 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ Python versions supported:
Topics
******

+ :ref:`user-guide`
+ :ref:`user-guide-rag`
+ :ref:`user-guide-pipeline`
+ :ref:`api-documentation`
+ :ref:`types-documentation`

Expand All @@ -39,7 +40,8 @@ Topics
:caption: Contents:
:hidden:

user_guide.rst
user_guide_rag.rst
user_guide_pipeline.rst
api.rst
types.rst

Expand Down
60 changes: 55 additions & 5 deletions docs/source/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,80 @@ Types
*****

RawSearchResult
==================
===============

.. autoclass:: neo4j_genai.types.RawSearchResult


RetrieverResult
==================
===============

.. autoclass:: neo4j_genai.types.RetrieverResult


RetrieverResultItem
====================
===================

.. autoclass:: neo4j_genai.types.RetrieverResultItem


LLMResponse
====================
===========

.. autoclass:: neo4j_genai.llm.types.LLMResponse


RagResultModel
====================
==============

.. autoclass:: neo4j_genai.generation.types.RagResultModel

TextChunk
=========

.. autoclass:: neo4j_genai.components.types.TextChunk

TextChunks
==========

.. autoclass:: neo4j_genai.components.types.TextChunks

Neo4jNode
=========

.. autoclass:: neo4j_genai.components.types.Neo4jNode

Neo4jRelationship
=================

.. autoclass:: neo4j_genai.components.types.Neo4jRelationship

Neo4jGraph
==========

.. autoclass:: neo4j_genai.components.types.Neo4jGraph

KGWriterModel
=============

.. autoclass:: neo4j_genai.components.kg_writer.KGWriterModel

SchemaProperty
==============

.. autoclass:: neo4j_genai.components.schema.SchemaProperty

SchemaEntity
============

.. autoclass:: neo4j_genai.components.schema.SchemaEntity

SchemaRelation
==============

.. autoclass:: neo4j_genai.components.schema.SchemaEntity

SchemaConfig
============

.. autoclass:: neo4j_genai.components.schema.SchemaConfig
Loading

0 comments on commit cc48eef

Please sign in to comment.