Skip to content

Commit

Permalink
bugfix/disable-search-service (#7)
Browse files Browse the repository at this point in the history
* bugfix/disable-search-service

* Apply suggestions from code review

Co-authored-by: Renee Li <[email protected]>

* Update models/rag__unified_document.sql

Co-authored-by: fivetran-catfritz <[email protected]>

* search service query help

* Update models/rag__unified_document.sql

Co-authored-by: fivetran-catfritz <[email protected]>

* changelog update

---------

Co-authored-by: Renee Li <[email protected]>
Co-authored-by: fivetran-catfritz <[email protected]>
  • Loading branch information
3 people authored Oct 28, 2024
1 parent 956b0d6 commit 94d865c
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 13 deletions.
26 changes: 25 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,28 @@
# dbt_unified_rag v0.1.0
# dbt_unified_rag v0.1.0-a2

[PR #7](https://github.com/fivetran/dbt_unified_rag/pull/7) includes the following updates:

## Bug Fixes
- For Snowflake destinations, we have removed the post-hook from the `rag__unified_document` which generated the `rag__unified_search` Cortex Search Service.
- While the Search Service worked when deployed locally, there were issues identified when deploying and running via Fivetran Quickstart. In order to ensure Snowflake users are still able to take advantage of the `rag__unified_document` end model, we have removed the Search Service from execution until we are able to verify it works as expected on all supported orchestration methods.
- If you would like, you can generate your own Snowflake Cortex Search Service by following the [Create Cortex Search Service](https://docs.snowflake.com/en/sql-reference/sql/create-cortex-search) guidelines provided by Snowflake. For additional assistance, you can structure your Cortex Search Service off of the below query to effectively leverage the `rag__unified_document` generated from this data model.
```sql
-- Cortex Search Service created using the rag__unified_document model

create cortex search service if not exists <your_schema>.<your_new_search_service_name>
on chunk
attributes unique_id
warehouse = <your_warehouse>
target_lag = '1 days' --You can specify this to your liking
as (
select * from rag__unified_document
)
```

## Under the Hood
- Adjusted the `cluster_by` configuration within the `dbt__unified_rag` to cluster by the `update_date` (previously `unique_id`) for improved Snowflake performance.

# dbt_unified_rag v0.1.0-a1

This is the initial release of the Unified RAG dbt package!

Expand Down
11 changes: 2 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
## What does this dbt package do?

<!--section="unified_rag_transformation_model"-->
The main focus of this dbt package is to generate an end model and [Cortex Search Service](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) (for Snowflake destinations only) which contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
The main focus of this dbt package is to generate an end model that contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
- [HubSpot](https://fivetran.com/docs/connectors/applications/hubspot): Deals
- [Jira](https://fivetran.com/docs/connectors/applications/jira): Issues
- [Zendesk](https://fivetran.com/docs/connectors/applications/zendesk): Tickets
Expand All @@ -26,12 +26,6 @@ The following table provides a detailed list of all models materialized within t
| **Table** | **Description** |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [rag__unified_document](https://fivetran.github.io/dbt_unified_rag/#!/model/model.unified_rag.rag__unified_document) | Each record represents a chunk of text prepared for semantic-search and additional fields for use in LLM workflows. |

Additionally, for **Snowflake** destinations, a [Cortex Search Service](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) will be generated as a result of this data model. The Cortex Search Service uses the results of the `rag__unified_document` and enables Snowflake users to take advantage of low-latency, high quality "fuzzy" search over their data for use in RAG applications leveraging LLMs. See the below table for details.

| **Snowflake Cortex Search Service** | **Description** |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [rag__unified_search](https://github.com/fivetran/dbt_unified_rag/blob/main/macros/search_generation.sql) | Generates a Snowflake Cortex Search service via the [search_generation](https://github.com/fivetran/dbt_unified_rag/blob/main/macros/search_generation.sql) macro as a post-hook for Snowflake destinations. This Cortex Search Service is currently configured with a target lag of 1 day. **Please be aware that this search service will refresh automatically once a day even outside of this data model execution.** To understand more about the Cortex Search Service, you can run `SHOW CORTEX SEARCH SERVICES` in the respective Snowflake database.schema which the `rag__unified_document` is materialized. See [here](https://docs.snowflake.com/en/sql-reference/commands-cortex-search) for other relevant commands to use for understanding the nature of the Search Service, and [here](https://docs.snowflake.com/en/sql-reference/functions/search_preview-snowflake-cortex) for helpful commands to use when leveraging the results of the Cortex Search Service in your LLM applications. |
<!--section-end-->

## How do I use the dbt package?
Expand All @@ -44,7 +38,6 @@ To use this dbt package, you must have the following:
- [Jira](https://fivetran.com/docs/connectors/applications/jira)
- [Zendesk Support](https://fivetran.com/docs/connectors/applications/zendesk)
- A **Snowflake**, **BigQuery**, **Databricks**, or **PostgreSQL** destination.
- Please note, the Cortex Search Service will only be generated for Snowflake destinations.
- Redshift destinations are not currently supported due to the stringent character limitations within string datatypes. If you would like Redshift destinations to be supported, please comment within our logged [Feature Request](https://github.com/fivetran/dbt_unified_rag/issues/3).

### Step 2: Install the package
Expand All @@ -53,7 +46,7 @@ Include the following package_display_name package version in your `packages.yml
```yml
packages:
- package: fivetran/unified_rag
version: 0.1.0-a1
version: 0.1.0-a2
```
### Step 3: Define database and schema variables
Expand Down
5 changes: 2 additions & 3 deletions models/rag__unified_document.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@
materialized='table' if unified_rag.is_databricks_sql_warehouse() else 'incremental',
partition_by = {'field': 'update_date', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['update_date'],
cluster_by = ['unique_id'],
cluster_by = ['update_date'],
unique_key='unique_id',
incremental_strategy = 'insert_overwrite' if target.type in ('bigquery', 'databricks', 'spark') else 'delete+insert',
file_format='delta' if unified_rag.is_databricks_sql_warehouse() else 'parquet',
post_hook=["{{ unified_rag.search_generation(this,'rag__unified_search') }}"] if target.type == 'snowflake' else []
file_format='delta'
)
}}

Expand Down

0 comments on commit 94d865c

Please sign in to comment.