Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
stellasia committed Oct 3, 2024
2 parents 84bca81 + 7d97932 commit 61b66d0
Show file tree
Hide file tree
Showing 37 changed files with 2,058 additions and 530 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pr-e2e-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.12']
python-version: ['3.9', '3.12']
neo4j-version:
- 5
neo4j-edition:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ '3.8', '3.9', '3.10', '3.11', '3.12' ]
python-version: [ '3.9', '3.10', '3.11', '3.12' ]
steps:
- name: Install graphviz package
run: sudo apt install graphviz graphviz-dev
Expand Down
46 changes: 46 additions & 0 deletions .github/workflows/pre-release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Publish a new Alpha release (x.y.0a0 -> x.y.0a1) 🚀

on:
workflow_dispatch:

jobs:
bump-version:
outputs:
version: ${{ steps.get-version.outputs.version }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
token: ${{ secrets.GIT_PUSH_PAT }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install and configure Poetry
uses: snok/install-poetry@v1
with:
version: 1.8.2
virtualenvs-create: false
virtualenvs-in-project: false
installer-parallel: true

- name: Bump version
run: poetry version prerelease

- name: Get version
id: get-version
run: echo version=`poetry version -s` >> "$GITHUB_OUTPUT"
- name: Print version
run: |
echo Version: ${{ steps.get-version.outputs.version }}
- name: Update CHANGELOG.md (cross platform supported)
run: |
sed -i.bak -e 's/## Next/## Next\n\n## ${{ steps.get-version.outputs.version }}/' CHANGELOG.md && rm CHANGELOG.md.bak
- uses: EndBug/add-and-commit@v9
with:
author_name: 'Neo4j-GraphRAG GitHub Action'
author_email: '[email protected]'
message: 'Bump version to ${{ steps.get-version.outputs.version }}'
add: "['pyproject.toml', 'CHANGELOG.md']"
tag: '${{ steps.get-version.outputs.version }}'
46 changes: 46 additions & 0 deletions .github/workflows/premajor-release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Publish a new Alpha Major release (0.4.0 -> 1.0.0a0) 🚀

on:
workflow_dispatch:

jobs:
bump-version:
outputs:
version: ${{ steps.get-version.outputs.version }}
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
token: ${{ secrets.GIT_PUSH_PAT }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.9'

- name: Install and configure Poetry
uses: snok/install-poetry@v1
with:
version: 1.8.2
virtualenvs-create: false
virtualenvs-in-project: false
installer-parallel: true

- name: Bump version
run: poetry version premajor

- name: Get version
id: get-version
run: echo version=`poetry version -s` >> "$GITHUB_OUTPUT"
- name: Print version
run: |
echo Version: ${{ steps.get-version.outputs.version }}
- name: Update CHANGELOG.md (cross platform supported)
run: |
sed -i.bak -e 's/## Next/## Next\n\n## ${{ steps.get-version.outputs.version }}/' CHANGELOG.md && rm CHANGELOG.md.bak
- uses: EndBug/add-and-commit@v9
with:
author_name: 'Neo4j-GraphRAG GitHub Action'
author_email: '[email protected]'
message: 'Bump version to ${{ steps.get-version.outputs.version }}'
add: "['pyproject.toml', 'CHANGELOG.md']"
tag: '${{ steps.get-version.outputs.version }}'
2 changes: 1 addition & 1 deletion .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.8"
python-version: "3.9"
- name: Install pypa/build
run: >-
python3 -m
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/scheduled-e2e-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.8', '3.9', '3.10', '3.11', '3.12']
python-version: ['3.9', '3.10', '3.11', '3.12']
neo4j-version:
- 5
neo4j-edition:
Expand Down
14 changes: 10 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@

## Next

- Added `SinglePropertyExactMatchResolver` component allowing to merge entities with exact same property (e.g. name)

## 0.7.0

### Added
- Added AzureOpenAILLM and AzureOpenAIEmbeddings to support Azure served OpenAI models
- Added `template` validation in `PromptTemplate` class upon construction.
- `custom_prompt` arg is now converted to `Text2CypherTemplate` class within the `Text2CypherRetriever.get_search_results` method.
- `Text2CypherTemplate` and `RAGTemplate` prompt templates now require `query_text` arg and will error if it is not present. Previous `query_text` aliases may be used, but will warn of deprecation.
- Examples demonstrating the use of Mistral embeddings and LLM in RAG pipelines.
- Fixed bug in `Text2CypherRetriever` using `custom_prompt` arg where the `search` method would not inject the `query_text` content.
- Added feature to include kwargs in `Text2CypherRetriever.search()` that will be injected into a custom prompt, if provided.
- Added validation to `custom_prompt` parameter of `Text2CypherRetriever` to ensure that `query_text` placeholder exists in prompt.
- Introduced a fixed size text splitter component for splitting text into specified fixed size chunks with overlap. Updated examples and tests to utilize this new component.
Expand All @@ -20,14 +21,19 @@

### Fixed
- Resolved import issue with the Vertex AI Embeddings class.
- Fixed bug in `Text2CypherRetriever` using `custom_prompt` arg where the `search` method would not inject the `query_text` content.
- `custom_prompt` arg is now converted to `Text2CypherTemplate` class within the `Text2CypherRetriever.get_search_results` method.
- `Text2CypherTemplate` and `RAGTemplate` prompt templates now require `query_text` arg and will error if it is not present. Previous `query_text` aliases may be used, but will warn of deprecation.
- Resolved issue where Neo4jWriter component would raise an error if the start or end node ID was not defined properly in the input.
- Resolved issue where relationship types was not escaped in the insert Cypher query.
- Improved query performance in Neo4jWriter.
- Improved query performance in Neo4jWriter: created nodes now have a generic `__KGBuilder__` label and an index is created on the `__KGBuilder__.id` property. Moreover, insertion queries are now batched. Batch size can be controlled using the `batch_size` parameter in the `Neo4jWriter` component.

### Changed
- Moved the Embedder class to the neo4j_graphrag.embeddings directory for better organization alongside other custom embedders.
- Removed query argument from the GraphRAG class' `.search` method; users must now use `query_text`.
- Neo4jWriter component now runs a single query to merge node and set its embeddings if any.
- Nodes created by the `Neo4jWriter` now have an extra `__KGBuilder__` label. Nodes from the entity graph also have an `__Entity__` label.
- Dropped support for Python 3.8 (end of life).

## 0.6.3
### Changed
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ Remember that many community members have become regular contributors and some a
## Specifically for this project
Setting up the development environment:

1. Install Python 3.8.1+
1. Install Python 3.9+
2. Install poetry (see https://python-poetry.org/docs/#installation)
3. Install dependencies:

Expand Down
44 changes: 42 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,12 @@ Python versions supported:
* Python 3.11 supported.
* Python 3.10 supported.
* Python 3.9 supported.
* Python 3.8 supported.

# Usage

## Installation

This package requires Python (>=3.8.1).
This package requires Python (>=3.9).

To install the latest stable version, use:

Expand All @@ -37,6 +36,47 @@ Follow installation instructions [here](https://pygraphviz.github.io/documentati

## Examples

### Knowledge graph construction

```python
from neo4j_graphrag.experimental.pipeline.kg_builder import SimpleKGPipeline
from neo4j_graphrag.llm.openai_llm import OpenAILLM

# Instantiate Entity and Relation objects
entities = ["PERSON", "ORGANIZATION", "LOCATION"]
relations = ["SITUATED_AT", "INTERACTS", "LED_BY"]
potential_schema = [
("PERSON", "SITUATED_AT", "LOCATION"),
("PERSON", "INTERACTS", "PERSON"),
("ORGANIZATION", "LED_BY", "PERSON"),
]

# Instantiate the LLM
llm = OpenAILLM(
model_name="gpt-4o",
model_params={
"max_tokens": 2000,
"response_format": {"type": "json_object"},
},
)

# Create an instance of the SimpleKGPipeline
kg_builder = SimpleKGPipeline(
llm=llm,
driver=driver,
file_path=file_path,
entities=entities,
relations=relations,
)

await kg_builder.run_async(text="""
Albert Einstein was a German physicist born in 1879 who wrote many groundbreaking
papers especially about general relativity and quantum mechanics.
""")
```



### Creating a vector index

When creating a vector index, make sure you match the number of dimensions in the index with the number of dimensions the embeddings have.
Expand Down
8 changes: 8 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,14 @@ LLMEntityRelationExtractor
:members: run


SinglePropertyExactMatchResolver
================================

.. autoclass:: neo4j_graphrag.experimental.components.resolver.SinglePropertyExactMatchResolver
:members: run



.. _pipeline-section:

********
Expand Down
5 changes: 2 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ Python versions supported:
* Python 3.11
* Python 3.10
* Python 3.9
* Python 3.8


******
Expand Down Expand Up @@ -60,7 +59,7 @@ Usage
Installation
************

This package requires Python (>=3.8.1).
This package requires Python (>=3.9).

To install the latest stable version, use:

Expand Down Expand Up @@ -302,4 +301,4 @@ Indices and tables

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
* :ref:`search`
54 changes: 50 additions & 4 deletions docs/source/user_guide_kg_builder.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ unstructured data.

This feature is still experimental. API changes and bug fixes are expected.

It is not recommended to use it in production yet.


******************
Pipeline Structure
Expand All @@ -26,6 +24,7 @@ A Knowledge Graph (KG) construction pipeline requires a few components:
- **Schema builder**: provide a schema to ground the LLM extracted entities and relations and obtain an easily navigable KG.
- **Entity and relation extractor**: extract relevant entities and relations from the text.
- **Knowledge Graph writer**: save the identified entities and relations.
- **Entity resolver**: merge similar entities into a single node.

.. image:: images/kg_builder_pipeline.png
:alt: KG Builder pipeline
Expand Down Expand Up @@ -389,7 +388,13 @@ to a Neo4j database:
graph = Neo4jGraph(nodes=[], relationships=[])
await writer.run(graph)
See :ref:`neo4jgraph` for the description of the input type.
To improve insert performances, it is possible to act on two parameters:

- `batch_size`: the number of nodes/relationships to be processed in each batch (default is 1000).
- `max_concurrency`: the max number of concurrent queries (default is 5).

See :ref:`neo4jgraph`.


It is possible to create a custom writer using the `KGWriter` interface:

Expand Down Expand Up @@ -419,4 +424,45 @@ It is possible to create a custom writer using the `KGWriter` interface:
The `validate_call` decorator is required when the input parameter contain a `pydantic` model.


See :ref:`kgwritermodel` and :ref:`kgwriter` in API reference.
See :ref:`kgwritermodel` and :ref:`kgwriter` in API reference.


Entity Resolver
===============

The KG Writer component creates new nodes for each identified entity
without making assumptions about entity similarity. The Entity Resolver
is responsible for refining the created knowledge graph by merging entity
nodes that represent the same real-world object.

In practice, this package implements a single resolver that merges nodes
with the same label and identical "name" property.

.. warning::

The `SinglePropertyExactMatchResolver` **replaces** the nodes created by the KG writer.


It can be used like this:

.. code:: python
from neo4j_graphrag.experimental.components.resolver import (
SinglePropertyExactMatchResolver,
)
resolver = SinglePropertyExactMatchResolver(driver)
res = await resolver.run()
.. warning::

By default, all nodes with the __Entity__ label will be resolved.
To exclude specific nodes, a filter_query can be added to the query.
For example, if a `:Resolved` label has been applied to already resolved entities
in the graph, these entities can be excluded with the following approach:

.. code:: python
from neo4j_graphrag.experimental.components.resolver import (
SinglePropertyExactMatchResolver,
)
resolver = SinglePropertyExactMatchResolver(driver, filter_query="WHERE not entity:Resolved")
res = await resolver.run()
Binary file not shown.
Loading

0 comments on commit 61b66d0

Please sign in to comment.