-
Notifications
You must be signed in to change notification settings - Fork 311
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'origin/dev-minor' into change-default-b…
…ehaviour
- Loading branch information
Showing
22 changed files
with
757 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
--- | ||
title: 'Advanced GraphRAG' | ||
description: 'Advanced GraphRAG Techniques with R2R' | ||
icon: 'chart-network' | ||
--- | ||
|
||
|
||
## Advanced GraphRAG Techniques | ||
|
||
R2R supports advanced GraphRAG techniques that can be easily configured at runtime. This flexibility allows you to experiment with different SoTA strategies and optimize your RAG pipeline for specific use cases. | ||
|
||
<Note> | ||
|
||
Advanced GraphRAG techniques are still a beta feature in R2R.There may be limitations in observability and analytics when implementing them. | ||
|
||
Are we missing an important technique? If so, then please let us know at [email protected]. | ||
|
||
</Note> | ||
|
||
|
||
### Prompt Tuning | ||
|
||
One way that we can improve upon GraphRAG's already impressive capabilities by tuning our prompts to a specific domain. When we create a knowledge graph, an LLM extracts the relationships between entities; but for very targeted domains, a general approach may fall short. | ||
|
||
To demonstrate this, we can run GraphRAG over the technical papers for the 2024 Nobel Prizes in chemistry, medicine, and physics. By tuning our prompts for GraphRAG, we attempt to understand our documents at a high level, and provide the LLM with a more pointed description. | ||
|
||
The following script, which utilizes the Python SDK, generates the tuned prompts and calls the knowledge graph creation process with these prompts at runtime: | ||
|
||
```python | ||
# Step 1: Tune the prompts for knowledge graph creation | ||
# Tune the entity description prompt | ||
entity_prompt_response = client.get_tuned_prompt( | ||
prompt_name="graphrag_entity_description" | ||
) | ||
tuned_entity_prompt = entity_prompt_response['results']['tuned_prompt'] | ||
|
||
# Tune the triples extraction prompt | ||
triples_prompt_response = client.get_tuned_prompt( | ||
prompt_name="graphrag_triples_extraction_few_shot" | ||
) | ||
tuned_triples_prompt = triples_prompt_response['results']['tuned_prompt'] | ||
|
||
# Step 2: Create the knowledge graph | ||
kg_settings = { | ||
"kg_entity_description_prompt": tuned_entity_prompt | ||
} | ||
|
||
# Generate the initial graph | ||
graph_response = client.create_graph( | ||
run_type="run", | ||
kg_creation_settings=kg_settings | ||
) | ||
|
||
# Step 3: Clean up the graph by removing duplicate entities | ||
client.deduplicate_entities( | ||
collection_id='122fdf6a-e116-546b-a8f6-e4cb2e2c0a09' | ||
) | ||
|
||
# Step 4: Tune and apply community reports prompt for graph enrichment | ||
community_prompt_response = client.get_tuned_prompt( | ||
prompt_name="graphrag_community_reports" | ||
) | ||
tuned_community_prompt = community_prompt_response['results']['tuned_prompt'] | ||
|
||
# Configure enrichment settings | ||
kg_enrichment_settings = { | ||
"community_reports_prompt": tuned_community_prompt | ||
} | ||
|
||
# Enrich the graph with additional information | ||
client.enrich_graph( | ||
run_type="run", | ||
kg_enrichment_settings=kg_enrichment_settings | ||
) | ||
``` | ||
|
||
For illustrative purposes, we look can look at the `graphrag_entity_description` prompt before and after prompt tuning. It's clear that with prompt tuning, we are able to capture the intent of the documents, giving us a more targeted prompt overall. | ||
|
||
<Tabs> | ||
<Tab title="Prompt after Prompt Tuning"> | ||
```yaml | ||
Provide a comprehensive yet concise summary of the given entity, incorporating its description and associated triples: | ||
|
||
Entity Info: | ||
{entity_info} | ||
Triples: | ||
{triples_txt} | ||
|
||
Your summary should: | ||
1. Clearly define the entity's core concept or purpose | ||
2. Highlight key relationships or attributes from the triples | ||
3. Integrate any relevant information from the existing description | ||
4. Maintain a neutral, factual tone | ||
5. Be approximately 2-3 sentences long | ||
|
||
Ensure the summary is coherent, informative, and captures the essence of the entity within the context of the provided information. | ||
``` | ||
|
||
</Tab> | ||
|
||
<Tab title="Prompt after Prompt Tuning"> | ||
```yaml | ||
Provide a comprehensive yet concise summary of the given entity, focusing on its significance in the field of scientific research, while incorporating its description and associated triples: | ||
|
||
Entity Info: | ||
{entity_info} | ||
Triples: | ||
{triples_txt} | ||
|
||
Your summary should: | ||
1. Clearly define the entity's core concept or purpose within computational biology, artificial intelligence, and medicine | ||
2. Highlight key relationships or attributes from the triples that illustrate advancements in scientific understanding and reasoning | ||
3. Integrate any relevant information from the existing description, particularly breakthroughs and methodologies | ||
4. Maintain a neutral, factual tone | ||
5. Be approximately 2-3 sentences long | ||
|
||
Ensure the summary is coherent, informative, and captures the essence of the entity within the context of the provided information, emphasizing its impact on the field. | ||
``` | ||
</Tab> | ||
|
||
</Tabs> | ||
|
||
After prompt tuning, we see an increase in the number of communities—after prompt tuning, these communities appear more focused and domain-specific with clearer thematic boundaries. | ||
|
||
Prompt tuning produces: | ||
- **More precise community separation:** GraphRAG alone produced a single `MicroRNA Research` Community, which GraphRAG with prompt tuning produced communities around `C. elegans MicroRNA Research`, `LET-7 MicroRNA`, and `miRNA-184 and EDICT Syndrome`. | ||
- **Enhanced domain focus:** Previously, we had a single community for `AI Researchers`, but with prompt tuning we create specialized communities such as `Hinton, Hopfield, and Deep Learning`, `Hochreiter and Schmidhuber`, and `Minksy and Papert's ANN Critique.` | ||
|
||
| Count | GraphRAG | GraphRAG with Prompt Tuning | | ||
|-------------|----------|-----------------------------| | ||
| Entities | 661 | 636 | | ||
| Triples | 509 | 503 | | ||
| Communities | 29 | 41 | | ||
|
||
Prompt tuning allow for us to generate communities that better reflect the natural organization of the domain knowledge while maintaining more precise technical and thematic boundaries between related concepts. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
[alembic] | ||
script_location = migrations | ||
sqlalchemy.url = postgresql+asyncpg://postgres:postgres@localhost/postgres | ||
|
||
[loggers] | ||
keys = root,sqlalchemy,alembic | ||
|
||
[handlers] | ||
keys = console | ||
|
||
[formatters] | ||
keys = generic | ||
|
||
[logger_root] | ||
level = WARN | ||
handlers = console | ||
qualname = | ||
|
||
[logger_sqlalchemy] | ||
level = WARN | ||
handlers = | ||
qualname = sqlalchemy.engine | ||
|
||
[logger_alembic] | ||
level = INFO | ||
handlers = | ||
qualname = alembic | ||
|
||
[handler_console] | ||
class = StreamHandler | ||
args = (sys.stderr,) | ||
level = NOTSET | ||
formatter = generic | ||
|
||
[formatter_generic] | ||
format = %(levelname)-5.5s [%(name)s] %(message)s | ||
datefmt = %H:%M:%S |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.