SciPhi-AI · emrgnt-cmplxty · Oct 16, 2024 · Oct 16, 2024 · Oct 16, 2024 · Oct 16, 2024
diff --git a/.github/workflows/r2r-full-integration-deep-dive-tests.yml b/.github/workflows/r2r-full-integration-deep-dive-tests.yml
@@ -5,10 +5,6 @@ on:
     branches:
       - dev
       - dev-minor
-  pull_request:
-    branches:
-      - dev
-      - dev-minor
   workflow_dispatch:
 
 jobs:

diff --git a/.github/workflows/r2r-full-py-integration-tests-mac-and-windows.yml b/.github/workflows/r2r-full-py-integration-tests-mac-and-windows.yml
@@ -1,14 +1,10 @@
-name: R2R Full Python Integration Test
+name: R2R Full Python Integration Test (macOS / windows)
 
 on:
   push:
     branches:
       - dev
       - dev-minor
-  pull_request:
-    branches:
-      - dev
-      - dev-minor
   workflow_dispatch:
 
 jobs:

diff --git a/.github/workflows/r2r-full-py-integration-tests.yml b/.github/workflows/r2r-full-py-integration-tests.yml
@@ -1,4 +1,4 @@
-name: R2R Full Python Integration Test
+name: R2R Full Python Integration Test (ubuntu)
 
 on:
   push:

diff --git a/.github/workflows/r2r-light-py-integration-tests-mac-and-windows.yml b/.github/workflows/r2r-light-py-integration-tests-mac-and-windows.yml
@@ -0,0 +1,85 @@
+# yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json
+
+name: R2R Light Python Integration Test (macOS / windows)
+
+on:
+  push:
+    branches:
+      - dev
+      - dev-minor
+  workflow_dispatch:
+
+jobs:
+  test:
+    runs-on: ${{ matrix.os }}
+
+    strategy:
+      matrix:
+        os: [ubuntu-latest, windows-latest, macos-latest]
+        test_category:
+          - cli-ingestion
+          - cli-retrieval
+          - cli-graphrag
+          - sdk-ingestion
+          - sdk-retrieval
+          - sdk-auth
+          - sdk-collections
+          - sdk-graphrag
+    env:
+      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+      TELEMETRY_ENABLED: 'false'
+      R2R_POSTGRES_HOST: localhost
+      R2R_POSTGRES_DBNAME: postgres
+      R2R_POSTGRES_PORT: '5432'
+      R2R_POSTGRES_PASSWORD: postgres
+      R2R_POSTGRES_USER: postgres
+      R2R_PROJECT_NAME: r2r_default
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Set up Python and install dependencies
+        uses: ./.github/actions/setup-python-light
+        with:
+          os: ${{ matrix.os }}
+
+      - name: Setup and start PostgreSQL
+        uses: ./.github/actions/setup-postgres-ext
+        with:
+          os: ${{ matrix.os }}
+
+      - name: Start R2R Light server
+        uses: ./.github/actions/start-r2r-light
+
+      - name: Run CLI Ingestion Tests
+        if: matrix.test_category == 'cli-ingestion'
+        uses: ./.github/actions/run-cli-ingestion-tests
+
+      - name: Run CLI Retrieval Tests
+        if: matrix.test_category == 'cli-retrieval'
+        uses: ./.github/actions/run-cli-retrieval-tests
+
+      - name: Run CLI GraphRAG Tests
+        if: matrix.test_category == 'cli-graphrag'
+        uses: ./.github/actions/run-cli-graphrag-tests
+
+      - name: Run SDK Ingestion Tests
+        if: matrix.test_category == 'sdk-ingestion'
+        uses: ./.github/actions/run-sdk-ingestion-tests
+
+      - name: Run SDK Retrieval Tests
+        if: matrix.test_category == 'sdk-retrieval'
+        uses: ./.github/actions/run-sdk-retrieval-tests
+
+      - name: Run SDK Auth Tests
+        if: matrix.test_category == 'sdk-auth'
+        uses: ./.github/actions/run-sdk-auth-tests
+
+      - name: Run SDK Collections Tests
+        if: matrix.test_category == 'sdk-collections'
+        uses: ./.github/actions/run-sdk-collections-tests
+
+      - name: Run SDK GraphRAG Tests
+        if: matrix.test_category == 'sdk-graphrag'
+        uses: ./.github/actions/run-cli-graphrag-tests
diff --git a/.github/workflows/r2r-light-py-integration-tests.yml b/.github/workflows/r2r-light-py-integration-tests.yml
@@ -1,6 +1,6 @@
 # yaml-language-server: $schema=https://json.schemastore.org/github-workflow.json
 
-name: R2R Light Python Integration Test
+name: R2R Light Python Integration Test (ubuntu)
 
 on:
   push:

diff --git a/docs/cookbooks/ingestion.mdx b/docs/cookbooks/ingestion.mdx
@@ -0,0 +1,103 @@
+---
+title: 'Ingestion Cookbook'
+description: 'Learn how to ingest, update, and delete documents with R2R'
+icon: 'file-arrow-up'
+---
+
+## Introduction
+
+R2R provides a powerful and flexible ingestion pipeline that allows you to efficiently process and manage various types of documents. This cookbook will guide you through the process of ingesting files, updating existing documents, and deleting documents using the R2R Python SDK.
+
+## Ingesting Files
+
+To ingest files into your R2R system, you can use the `ingest_files` method from the Python SDK:
+
+```python
+file_paths = ['path/to/file1.txt', 'path/to/file2.txt']
+metadatas = [{'key1': 'value1'}, {'key2': 'value2'}]
+
+ingest_response = client.ingest_files(
+    file_paths=file_paths,
+    metadatas=metadatas,
+    ingestion_config={
+        "provider": "unstructured_local",
+        "strategy": "auto",
+        "chunking_strategy": "by_title",
+        "new_after_n_chars": 256,
+        "max_characters": 512,
+        "combine_under_n_chars": 64,
+        "overlap": 100,
+    }
+)
+```
+
+The `ingest_files` method accepts the following parameters:
+
+- `file_paths` (required): A list of file paths or directory paths to ingest.
+- `metadatas` (optional): A list of metadata dictionaries corresponding to each file.
+- `document_ids` (optional): A list of document IDs to assign to the ingested files.
+- `ingestion_config` (optional): Custom ingestion settings to override the default configuration, which you can read more about [here](/documentation/configuration/ingestion/overview).
+
+## Ingesting Chunks
+
+If you have pre-processed chunks of text, you can directly ingest them using the `ingest_chunks` method:
+
+```python
+chunks = [
+    {"text": "This is the first chunk.", "metadata": {"source": "document1"}},
+    {"text": "This is the second chunk.", "metadata": {"source": "document2"}},
+]
+
+ingest_response = client.ingest_chunks(
+    chunks=chunks,
+    document_id="custom_document_id",
+    metadata={"custom_metadata": "value"},
+)
+```
+
+The `ingest_chunks` method accepts the following parameters:
+
+- `chunks` (required): A list of dictionaries containing the text and metadata for each chunk.
+- `document_id` (optional): A custom document ID to assign to the ingested chunks.
+- `metadata` (optional): Additional metadata to associate with the ingested chunks.
+
+## Updating Files
+
+To update existing documents in your R2R system, you can use the `update_files` method:
+
+```python
+file_paths = ['path/to/updated_file1.txt', 'path/to/updated_file2.txt']
+document_ids = ['document1_id', 'document2_id']
+
+update_response = client.update_files(
+    file_paths=file_paths,
+    document_ids=document_ids,
+    metadatas=[{"version": "2.0"}, {"version": "1.5"}],
+)
+```
+
+The `update_files` method accepts the following parameters:
+
+- `file_paths` (required): A list of file paths for the updated documents.
+- `document_ids` (required): A list of document IDs corresponding to the files being updated.
+- `metadatas` (optional): A list of metadata dictionaries to update for each document.
+
+## Deleting Documents
+
+To delete documents from your R2R system, you can use the `delete` method:
+
+```python
+delete_response = client.delete(
+    {
+        "document_id": {"$eq": "document1_id"}
+    }
+)
+```
+
+The `delete` method accepts a dictionary specifying the filters to identify the documents to delete. In this example, it deletes the document with the ID "document1_id".
+
+## Conclusion
+
+R2R's ingestion pipeline provides a flexible and efficient way to process, update, and manage your documents. By utilizing the `ingest_files`, `ingest_chunks`, `update_files`, and `delete` methods from the Python SDK, you can seamlessly integrate document management capabilities into your applications.
+
+For more detailed information on the available parameters and response formats, refer to the [Python SDK Ingestion Documentation](/documentation/python-sdk/ingestion).
diff --git a/docs/documentation/configuration/prompts.mdx b/docs/documentation/configuration/prompts.mdx
@@ -54,8 +54,6 @@ Certainly! I'll create an expanded table that explains all the prompts you've li
 | Prompt File | Purpose |
 |-------------|---------|
 | [`default_rag.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/default_rag.yaml) | Default prompt for Retrieval-Augmented Generation (RAG) tasks. It instructs the model to answer queries based on provided context, using line item references. |
-| [`few_shot_ner_kg_extraction.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/few_shot_ner_kg_extraction.yaml) | Used for few-shot Named Entity Recognition (NER) and Knowledge Graph (KG) extraction. It provides examples to guide the model in identifying entities and relationships. |
-| [`few_shot_ner_kg_extraction_with_spec.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/few_shot_ner_kg_extraction_with_spec.yaml) | Similar to the above, but includes a specific schema or specification for the extraction process. |
 | [`graphrag_community_reports.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/graphrag_community_reports.yaml) | Used in GraphRAG to generate reports about communities or clusters in the knowledge graph. |
 | [`graphrag_entity_description.yaml.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/graphrag_entity_description.yaml) | System prompt for the "map" phase in GraphRAG, used to process individual nodes or edges. |
 | [`graphrag_map_system.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/graphrag_map_system.yaml) | System prompt for the "map" phase in GraphRAG, used to process individual nodes or edges. |
@@ -68,8 +66,6 @@ Certainly! I'll create an expanded table that explains all the prompts you've li
 | [`rag_context.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/rag_context.yaml) | Used to process or format the context retrieved for RAG tasks. |
 | [`rag_fusion.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/rag_fusion.yaml) | Used in RAG fusion techniques, possibly for combining information from multiple retrieved passages. |
 | [`system.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/system.yaml) | Contains general system-level prompts or instructions for the R2R system. |
-| [`zero_shot_ner_kg_extraction.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/zero_shot_ner_kg_extraction.yaml) | Used for zero-shot Named Entity Recognition and Knowledge Graph extraction, without providing examples. |
-| [`zero_shot_ner_kg_extraction_with_spec.yaml`](https://github.com/SciPhi-AI/R2R/blob/main/py/core/providers/prompts/defaults/zero_shot_ner_kg_extraction_with_spec.yaml) | Similar to the above, but includes a specific schema or specification for the extraction process. |
 
 
 You can find the full list of default prompts and their contents in the [defaults directory](https://github.com/SciPhi-AI/R2R/tree/main/py/core/providers/prompts/defaults).

diff --git a/docs/mint.json b/docs/mint.json
@@ -401,6 +401,7 @@
       "group": "General",
       "pages": [
         "cookbooks/walkthrough",
+        "cookbooks/ingestion",
         "cookbooks/hybrid-search",
         "cookbooks/graphrag",
         "cookbooks/advanced-rag",

diff --git a/py/core/providers/prompts/defaults/few_shot_ner_kg_extraction.yaml b/py/core/providers/prompts/defaults/few_shot_ner_kg_extraction.yaml