bugfix/disable-search-service (#7)

* bugfix/disable-search-service * Apply suggestions from code review Co-authored-by: Renee Li <[email protected]> * Update models/rag__unified_document.sql Co-authored-by: fivetran-catfritz <[email protected]> * search service query help * Update models/rag__unified_document.sql Co-authored-by: fivetran-catfritz <[email protected]> * changelog update --------- Co-authored-by: Renee Li <[email protected]> Co-authored-by: fivetran-catfritz <[email protected]>
fivetran · Oct 28, 2024 · 94d865c · 94d865c
1 parent 956b0d6
commit 94d865c
Show file tree

Hide file tree

Showing 3 changed files with 29 additions and 13 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,4 +1,28 @@
-# dbt_unified_rag v0.1.0
+# dbt_unified_rag v0.1.0-a2
+
+[PR #7](https://github.com/fivetran/dbt_unified_rag/pull/7) includes the following updates: 
+
+## Bug Fixes
+- For Snowflake destinations, we have removed the post-hook from the `rag__unified_document` which generated the `rag__unified_search` Cortex Search Service. 
+    - While the Search Service worked when deployed locally, there were issues identified when deploying and running via Fivetran Quickstart. In order to ensure Snowflake users are still able to take advantage of the `rag__unified_document` end model, we have removed the Search Service from execution until we are able to verify it works as expected on all supported orchestration methods.
+    - If you would like, you can generate your own Snowflake Cortex Search Service by following the [Create Cortex Search Service](https://docs.snowflake.com/en/sql-reference/sql/create-cortex-search) guidelines provided by Snowflake. For additional assistance, you can structure your Cortex Search Service off of the below query to effectively leverage the `rag__unified_document` generated from this data model.
+    ```sql
+    -- Cortex Search Service created using the rag__unified_document model
+
+    create cortex search service if not exists <your_schema>.<your_new_search_service_name>
+        on chunk
+        attributes unique_id
+        warehouse = <your_warehouse>
+        target_lag = '1 days' --You can specify this to your liking
+        as (
+            select * from rag__unified_document
+        )
+    ```
+
+## Under the Hood
+- Adjusted the `cluster_by` configuration within the `dbt__unified_rag` to cluster by the `update_date` (previously `unique_id`) for improved Snowflake performance.
+
+# dbt_unified_rag v0.1.0-a1
 
 This is the initial release of the Unified RAG dbt package!
 

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 ## What does this dbt package do?
 
 <!--section="unified_rag_transformation_model"-->
-The main focus of this dbt package is to generate an end model and [Cortex Search Service](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) (for Snowflake destinations only) which contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
+The main focus of this dbt package is to generate an end model that contains the below relevant unstructured document data to be used for Retrieval Augmented Generation (RAG) applications leveraging Large Language Models (LLMs):
 - [HubSpot](https://fivetran.com/docs/connectors/applications/hubspot): Deals
 - [Jira](https://fivetran.com/docs/connectors/applications/jira): Issues
 - [Zendesk](https://fivetran.com/docs/connectors/applications/zendesk): Tickets  
@@ -26,12 +26,6 @@ The following table provides a detailed list of all models materialized within t
 | **Table**                 | **Description**                                                                                                    |
 | ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
 | [rag__unified_document](https://fivetran.github.io/dbt_unified_rag/#!/model/model.unified_rag.rag__unified_document)  | Each record represents a chunk of text prepared for semantic-search and additional fields for use in LLM workflows.   |
-
-Additionally, for **Snowflake** destinations, a [Cortex Search Service](https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-search/cortex-search-overview) will be generated as a result of this data model. The Cortex Search Service uses the results of the `rag__unified_document` and enables Snowflake users to take advantage of low-latency, high quality "fuzzy" search over their data for use in RAG applications leveraging LLMs. See the below table for details.
-
-| **Snowflake Cortex Search Service**     | **Description**                               |
-| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
-| [rag__unified_search](https://github.com/fivetran/dbt_unified_rag/blob/main/macros/search_generation.sql)  |  Generates a Snowflake Cortex Search service via the [search_generation](https://github.com/fivetran/dbt_unified_rag/blob/main/macros/search_generation.sql) macro as a post-hook for Snowflake destinations. This Cortex Search Service is currently configured with a target lag of 1 day. **Please be aware that this search service will refresh automatically once a day even outside of this data model execution.** To understand more about the Cortex Search Service, you can run `SHOW CORTEX SEARCH SERVICES` in the respective Snowflake database.schema which the `rag__unified_document` is materialized. See [here](https://docs.snowflake.com/en/sql-reference/commands-cortex-search) for other relevant commands to use for understanding the nature of the Search Service, and [here](https://docs.snowflake.com/en/sql-reference/functions/search_preview-snowflake-cortex) for helpful commands to use when leveraging the results of the Cortex Search Service in your LLM applications.  |
 <!--section-end-->
 
 ## How do I use the dbt package?
@@ -44,7 +38,6 @@ To use this dbt package, you must have the following:
     - [Jira](https://fivetran.com/docs/connectors/applications/jira)
     - [Zendesk Support](https://fivetran.com/docs/connectors/applications/zendesk)
 - A **Snowflake**, **BigQuery**, **Databricks**, or **PostgreSQL** destination.
-    - Please note, the Cortex Search Service will only be generated for Snowflake destinations.
     - Redshift destinations are not currently supported due to the stringent character limitations within string datatypes. If you would like Redshift destinations to be supported, please comment within our logged [Feature Request](https://github.com/fivetran/dbt_unified_rag/issues/3).
 
 ### Step 2: Install the package
@@ -53,7 +46,7 @@ Include the following package_display_name package version in your `packages.yml
 ```yml
 packages:
   - package: fivetran/unified_rag
-    version: 0.1.0-a1
+    version: 0.1.0-a2
 ```
 
 ### Step 3: Define database and schema variables

diff --git a/models/rag__unified_document.sql b/models/rag__unified_document.sql
@@ -3,11 +3,10 @@
         materialized='table' if unified_rag.is_databricks_sql_warehouse() else 'incremental',
         partition_by = {'field': 'update_date', 'data_type': 'date'}
             if target.type not in ['spark', 'databricks'] else ['update_date'],
-        cluster_by = ['unique_id'],
+        cluster_by = ['update_date'],
         unique_key='unique_id',
         incremental_strategy = 'insert_overwrite' if target.type in ('bigquery', 'databricks', 'spark') else 'delete+insert',
-        file_format='delta' if unified_rag.is_databricks_sql_warehouse() else 'parquet',
-        post_hook=["{{ unified_rag.search_generation(this,'rag__unified_search') }}"] if target.type == 'snowflake' else []
+        file_format='delta'
     )
 }}