Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate test run freq #1412

Merged
merged 9 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions .github/workflows/r2r-full-integration-deep-dive-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,6 @@ on:
branches:
- dev
- dev-minor
pull_request:
branches:
- dev
- dev-minor
workflow_dispatch:

jobs:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
name: R2R Full Python Integration Test
name: R2R Full Python Integration Test (macOS / windows)

on:
push:
branches:
- dev
- dev-minor
pull_request:
branches:
- dev
- dev-minor
workflow_dispatch:

jobs:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/r2r-full-py-integration-tests.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: R2R Full Python Integration Test
name: R2R Full Python Integration Test (ubuntu)

on:
push:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json

name: R2R Light Python Integration Test (macOS / windows)

on:
push:
branches:
- dev
- dev-minor
workflow_dispatch:

jobs:
test:
runs-on: ${{ matrix.os }}

strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
test_category:
- cli-ingestion
- cli-retrieval
- cli-graphrag
- sdk-ingestion
- sdk-retrieval
- sdk-auth
- sdk-collections
- sdk-graphrag
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
TELEMETRY_ENABLED: 'false'
R2R_POSTGRES_HOST: localhost
R2R_POSTGRES_DBNAME: postgres
R2R_POSTGRES_PORT: '5432'
R2R_POSTGRES_PASSWORD: postgres
R2R_POSTGRES_USER: postgres
R2R_PROJECT_NAME: r2r_default

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Python and install dependencies
uses: ./.github/actions/setup-python-light
with:
os: ${{ matrix.os }}

- name: Setup and start PostgreSQL
uses: ./.github/actions/setup-postgres-ext
with:
os: ${{ matrix.os }}

- name: Start R2R Light server
uses: ./.github/actions/start-r2r-light

- name: Run CLI Ingestion Tests
if: matrix.test_category == 'cli-ingestion'
uses: ./.github/actions/run-cli-ingestion-tests

- name: Run CLI Retrieval Tests
if: matrix.test_category == 'cli-retrieval'
uses: ./.github/actions/run-cli-retrieval-tests

- name: Run CLI GraphRAG Tests
if: matrix.test_category == 'cli-graphrag'
uses: ./.github/actions/run-cli-graphrag-tests

- name: Run SDK Ingestion Tests
if: matrix.test_category == 'sdk-ingestion'
uses: ./.github/actions/run-sdk-ingestion-tests

- name: Run SDK Retrieval Tests
if: matrix.test_category == 'sdk-retrieval'
uses: ./.github/actions/run-sdk-retrieval-tests

- name: Run SDK Auth Tests
if: matrix.test_category == 'sdk-auth'
uses: ./.github/actions/run-sdk-auth-tests

- name: Run SDK Collections Tests
if: matrix.test_category == 'sdk-collections'
uses: ./.github/actions/run-sdk-collections-tests

- name: Run SDK GraphRAG Tests
if: matrix.test_category == 'sdk-graphrag'
uses: ./.github/actions/run-cli-graphrag-tests
2 changes: 1 addition & 1 deletion .github/workflows/r2r-light-py-integration-tests.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json

name: R2R Light Python Integration Test
name: R2R Light Python Integration Test (ubuntu)

on:
push:
Expand Down
103 changes: 103 additions & 0 deletions docs/cookbooks/ingestion.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: 'Ingestion Cookbook'
description: 'Learn how to ingest, update, and delete documents with R2R'
icon: 'file-arrow-up'
---

## Introduction

R2R provides a powerful and flexible ingestion pipeline that allows you to efficiently process and manage various types of documents. This cookbook will guide you through the process of ingesting files, updating existing documents, and deleting documents using the R2R Python SDK.

## Ingesting Files

To ingest files into your R2R system, you can use the `ingest_files` method from the Python SDK:

```python
file_paths = ['path/to/file1.txt', 'path/to/file2.txt']
metadatas = [{'key1': 'value1'}, {'key2': 'value2'}]

ingest_response = client.ingest_files(
file_paths=file_paths,
metadatas=metadatas,
ingestion_config={
"provider": "unstructured_local",
"strategy": "auto",
"chunking_strategy": "by_title",
"new_after_n_chars": 256,
"max_characters": 512,
"combine_under_n_chars": 64,
"overlap": 100,
}
)
```

The `ingest_files` method accepts the following parameters:

- `file_paths` (required): A list of file paths or directory paths to ingest.
- `metadatas` (optional): A list of metadata dictionaries corresponding to each file.
- `document_ids` (optional): A list of document IDs to assign to the ingested files.
- `ingestion_config` (optional): Custom ingestion settings to override the default configuration, which you can read more about [here](/documentation/configuration/ingestion/overview).

## Ingesting Chunks

If you have pre-processed chunks of text, you can directly ingest them using the `ingest_chunks` method:

```python
chunks = [
{"text": "This is the first chunk.", "metadata": {"source": "document1"}},
{"text": "This is the second chunk.", "metadata": {"source": "document2"}},
]

ingest_response = client.ingest_chunks(
chunks=chunks,
document_id="custom_document_id",
metadata={"custom_metadata": "value"},
)
```

The `ingest_chunks` method accepts the following parameters:

- `chunks` (required): A list of dictionaries containing the text and metadata for each chunk.
- `document_id` (optional): A custom document ID to assign to the ingested chunks.
- `metadata` (optional): Additional metadata to associate with the ingested chunks.

## Updating Files

To update existing documents in your R2R system, you can use the `update_files` method:

```python
file_paths = ['path/to/updated_file1.txt', 'path/to/updated_file2.txt']
document_ids = ['document1_id', 'document2_id']

update_response = client.update_files(
file_paths=file_paths,
document_ids=document_ids,
metadatas=[{"version": "2.0"}, {"version": "1.5"}],
)
```

The `update_files` method accepts the following parameters:

- `file_paths` (required): A list of file paths for the updated documents.
- `document_ids` (required): A list of document IDs corresponding to the files being updated.
- `metadatas` (optional): A list of metadata dictionaries to update for each document.

## Deleting Documents

To delete documents from your R2R system, you can use the `delete` method:

```python
delete_response = client.delete(
{
"document_id": {"$eq": "document1_id"}
}
)
```

The `delete` method accepts a dictionary specifying the filters to identify the documents to delete. In this example, it deletes the document with the ID "document1_id".

## Conclusion

R2R's ingestion pipeline provides a flexible and efficient way to process, update, and manage your documents. By utilizing the `ingest_files`, `ingest_chunks`, `update_files`, and `delete` methods from the Python SDK, you can seamlessly integrate document management capabilities into your applications.

For more detailed information on the available parameters and response formats, refer to the [Python SDK Ingestion Documentation](/documentation/python-sdk/ingestion).
4 changes: 0 additions & 4 deletions docs/documentation/configuration/prompts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,6 @@ Certainly! I'll create an expanded table that explains all the prompts you've li
| Prompt File | Purpose |
|-------------|---------|
| [`default_rag.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/default_rag.yaml) | Default prompt for Retrieval-Augmented Generation (RAG) tasks. It instructs the model to answer queries based on provided context, using line item references. |
| [`few_shot_ner_kg_extraction.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/few_shot_ner_kg_extraction.yaml) | Used for few-shot Named Entity Recognition (NER) and Knowledge Graph (KG) extraction. It provides examples to guide the model in identifying entities and relationships. |
| [`few_shot_ner_kg_extraction_with_spec.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/few_shot_ner_kg_extraction_with_spec.yaml) | Similar to the above, but includes a specific schema or specification for the extraction process. |
| [`graphrag_community_reports.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/graphrag_community_reports.yaml) | Used in GraphRAG to generate reports about communities or clusters in the knowledge graph. |
| [`graphrag_entity_description.yaml.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/graphrag_entity_description.yaml) | System prompt for the "map" phase in GraphRAG, used to process individual nodes or edges. |
| [`graphrag_map_system.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/graphrag_map_system.yaml) | System prompt for the "map" phase in GraphRAG, used to process individual nodes or edges. |
Expand All @@ -68,8 +66,6 @@ Certainly! I'll create an expanded table that explains all the prompts you've li
| [`rag_context.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/rag_context.yaml) | Used to process or format the context retrieved for RAG tasks. |
| [`rag_fusion.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/rag_fusion.yaml) | Used in RAG fusion techniques, possibly for combining information from multiple retrieved passages. |
| [`system.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/system.yaml) | Contains general system-level prompts or instructions for the R2R system. |
| [`zero_shot_ner_kg_extraction.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/zero_shot_ner_kg_extraction.yaml) | Used for zero-shot Named Entity Recognition and Knowledge Graph extraction, without providing examples. |
| [`zero_shot_ner_kg_extraction_with_spec.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/zero_shot_ner_kg_extraction_with_spec.yaml) | Similar to the above, but includes a specific schema or specification for the extraction process. |


You can find the full list of default prompts and their contents in the [defaults directory](https://github.com/SciPhi-AI/R2R/tree/main/py/core/providers/prompts/defaults).
Expand Down
1 change: 1 addition & 0 deletions docs/mint.json
Original file line number Diff line number Diff line change
Expand Up @@ -401,6 +401,7 @@
"group": "General",
"pages": [
"cookbooks/walkthrough",
"cookbooks/ingestion",
"cookbooks/hybrid-search",
"cookbooks/graphrag",
"cookbooks/advanced-rag",
Expand Down
129 changes: 0 additions & 129 deletions py/core/providers/prompts/defaults/few_shot_ner_kg_extraction.yaml

This file was deleted.

Loading
Loading