Skip to content

Commit

Permalink
docs: doc uppdates for materialize-azure-fabric-warehouse
Browse files Browse the repository at this point in the history
Some baseline docs for the new fabric warehouse materialization.

See estuary/connectors#2300
  • Loading branch information
williamhbaker committed Feb 4, 2025
1 parent 1bbb332 commit d0d1c11
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,6 @@ Flow collections to your bucket.
| `/prefix` | Prefix | Optional prefix that will be used to store objects. | string | |
| `/fileSizeLimit` | File Size Limit | Approximate maximum size of materialized files in bytes. Defaults to 10737418240 (10 GiB) if blank. | integer | |
| `/endpoint` | Custom S3 Endpoint | The S3 endpoint URI to connect to. Use if you're materializing to a compatible API that isn't provided by AWS. Should normally be left blank. | string | |
| `/csvConfig/delimiter` | Delimiter | Character to separate columns within a row. Defaults to a comma if blank. Must be a single character with a byte length of 1. | integer | |
| `/csvConfig/nullString` | Null String | String to use to represent NULL values. Defaults to an empty string if blank. | integer | |
| `/csvConfig/skipHeaders` | Skip Headers | Do not write headers to files. | integer | |

#### Bindings
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Microsoft Azure Fabric Warehouse

This connector materializes Flow collections into tables in Microsoft Azure
Fabric Warehouse.

[`ghcr.io/estuary/azure-fabric-warehouse:dev`](https://ghcr.io/estuary/azure-fabric-warehouse:dev)
provides the latest connector image. You can also follow the link in your
browser to see past image versions.

## Prerequisites

To use this connector, you'll need:
- The connection string for a [Fabric
Warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/create-warehouse).
See
[instructions](https://learn.microsoft.com/en-us/fabric/data-warehouse/connectivity#retrieve-the-sql-connection-string)
for finding the connection string.
- A service principal for connecting to the warehouse. The **Client ID** and
**Client Secret** are needed to configure the connector.
- Follow [this
guide](https://learn.microsoft.com/en-us/entra/identity-platform/howto-create-service-principal-portal)
to register a Microsoft Entra app and create a service principal. Use
**Option 3: Create a new client secret** to create the service principal and
save its client secret.
- Follow [these
instructions](https://learn.microsoft.com/en-us/fabric/data-warehouse/entra-id-authentication#tenant-setting)
for enabling service principal access to Fabric APIs.
- Assign the service principal the **Contributor** role for the workspace as
described
[here](https://learn.microsoft.com/en-us/fabric/data-warehouse/entra-id-authentication#workspace-setting).
- A **Storage Account Key** for a storage account that will be used to store
temporary staging files that will be loaded into your warehouse. Follow [this
guide](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-create)
to create a storage account. You can find your storage account key using
[these
instructions](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal#view-account-access-keys).
- The name of the container within the storage account for storing staging
files.


## Configuration

Use the below properties to configure the materialization, which will direct one or more of your
Flow collections to your tables.

### Properties

#### Endpoint

| Property | Title | Description | Type | Required/Default |
|---------------------------|--------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|--------|------------------|
| **`/clientID`** | Client ID | Client ID for the service principal used to connect to the Azure Fabric Warehouse. services. | string | Required |
| **`/clientSecret`** | Client Secret | Client Secret for the service principal used to connect to the Azure Fabric Warehouse. services. | string | Required |
| **`/warehouse`** | Warehouse | Name of the Azure Fabric Warehouse to connect to. to. | string | Required |
| **`/schema`** | Schema | Schema for bound collection tables (unless overridden within the binding resource configuration) as well as associated materialization metadata tables. | string | Required |
| **`/connectionString`** | Connection String | SQL connection string for the Azure Fabric Warehouse. | string | Required |
| **`/storageAccountName`** | Storage Account Name | Name of the storage account that temporary files will be written to. | string | Required |
| **`/storageAccountKey`** | Storage Account Key | Storage account key for the storage account that temporary files will be written to. | string | Required |
| **`/containerName`** | Storage Account Container Name | Name of the container in the storage account where temporary files will be written. | string | Required |
| `/directory` | Directory | Optional prefix that will be used for temporary files. | string | |

#### Bindings

| Property | Title | Description | Type | Required/Default |
|------------------|--------------------|------------------------------------------------------------|---------|------------------|
| **`/table`** | Table | Table name | string | Required |
| `/schema` | Alternative Schema | Alternative schema for this table | string | |
| `/delta_updates` | Delta updates | Whether to use standard or [delta updates](#delta-updates) | boolean | |

## Sync Schedule

This connector supports configuring a schedule for sync frequency. You can read
about how to configure this [here](../../materialization-sync-schedule.md).

## Delta updates

This connector supports both standard (merge) and [delta
updates](../../../concepts/materialization.md#delta-updates). The default is to
use standard updates.

Enabling delta updates will prevent Flow from querying for documents in your
tables, which can reduce latency and costs for large datasets. If you're certain
that all events will have unique keys, enabling delta updates is a simple way to
improve performance with no effect on the output. However, enabling delta
updates is not suitable for all workflows, as the resulting table won't be fully
reduced.
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,6 @@ Flow collections to your bucket.
| **`/uploadInterval`** | Upload Interval | Frequency at which files will be uploaded. | string | 5m |
| `/prefix` | Prefix | Optional prefix that will be used to store objects. | string | |
| `/fileSizeLimit` | File Size Limit | Approximate maximum size of materialized files in bytes. Defaults to 10737418240 (10 GiB) if blank. | integer | |
| `/csvConfig/delimiter` | Delimiter | Character to separate columns within a row. Defaults to a comma if blank. Must be a single character with a byte length of 1. | integer | |
| `/csvConfig/nullString` | Null String | String to use to represent NULL values. Defaults to an empty string if blank. | integer | |
| `/csvConfig/skipHeaders` | Skip Headers | Do not write headers to files. | integer | |

#### Bindings
Expand Down

0 comments on commit d0d1c11

Please sign in to comment.