diff --git a/docs/data-sources/athena.md b/docs/data-sources/athena.md index b3a29d5cda..05526e267e 100644 --- a/docs/data-sources/athena.md +++ b/docs/data-sources/athena.md @@ -20,7 +20,7 @@ To add Athena data source connection to DQOps you need the following: - Writing objects in the S3 bucket - Updating of the Lake Formation database -## Add Athena connection using the user interface +## Add an Athena connection using the user interface ### **Navigate to the connection settings** @@ -69,7 +69,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -98,7 +98,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Athena connection using DQOps Shell +## Add an Athena connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -160,7 +160,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -170,7 +170,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -256,6 +256,6 @@ To set the credential file in DQOps, follow these steps: ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/aws.md b/docs/data-sources/aws.md index c38c44a634..f550410948 100644 --- a/docs/data-sources/aws.md +++ b/docs/data-sources/aws.md @@ -4,124 +4,87 @@ title: How to activate data observability for AWS S3 # How to activate data observability for AWS S3 -This guide shows how to activate data observability for AWS by connecting DQOps. -The example will use the S3 for storing data. +This guide shows how to enable data observability for data stored in a AWS S3 buckets using DQOps. To seamlessly connect to AWS S3 +buckets, DQOps uses the DuckDB connector. ## Prerequisites -- Data in CSV, JSON or Parquet format (compressed files allowed), located in a Bucket. -- [DQOps installation](../getting-started/installation.md) +- Data in CSV, JSON, or Parquet format (compressed files allowed), stored in an AWS S3 bucket. +- [Installed DQOps](../getting-started/installation.md). +- Access permission and credentials to AWS S3. -## Add connection to AWS S3 using the user interface +### **Generate Credentials** -### **Navigate to the connection settings** - -To navigate to the AWS S3 connection settings: - -1. Go to the Data Sources section and click the **+ Add connection** button in the upper left corner. - - ![Adding connection](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection.png){ loading=lazy; width="1200px" } - -2. Select the DuckDB connection. - - ![Selecting DuckDB connection type](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection-duckdb.png){ loading=lazy; width="1200px" } - - -### **Fill in the connection settings** +To connect DQOps to AWS S3, you need to obtain credentials. -After navigating to the AWS S3 connection settings, you will need to fill in its details. - -![Adding connection settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-json.png){ loading=lazy; width="1200px" } - -Fill the **Connection name** any name you want. - -Change the **Files location** to **AWS S3**, to work with files located in AWS S3. - -Select the **File Format** suitable to your files located in AWS S3. You can choose from CSV, JSON or Parquet file format. - -To complete the configuration you need to set the: - -- **AWS authentication mode** -- **Path** - - -## Choose the AWS authentication mode - -DQOps requires permissions to establish the connection to the AWS S3 storage. - -You can choose from a variety of authentication methods that will allow to connect to your data: +DQOps supports two primary authentication methods for connecting to AWS S3: - IAM - Default Credential -Below you can find how to get credentials for each of the authentication methods. +Below, you can find how to get credentials for each authentication methods. ### **IAM** -This is the recommended authentication method. - -The service account is an impersonalized identity used specifically for a service with a proper permission. - -This method requires creating a service account and generating a secret. +This is the recommended authentication method. Follow these steps to set up IAM authentication: -Start with creating a service account in AWS. -Open **IAM**, navigate to **Users** and click the **Create user** button. +1. **Create a Service Account** -![Create service account](https://dqops.com/docs/images/data-sources/aws/aws-create-service-account.png){ loading=lazy; } + Open the **IAM** console, navigate to **Users**, and click the **Create user** button. -Set the name of the service account. + ![Create service account](https://dqops.com/docs/images/data-sources/aws/aws-create-service-account.png){ loading=lazy; } -![Create service account step 1](https://dqops.com/docs/images/data-sources/aws/aws-create-step-1.png){ loading=lazy; } + Set the name of the service account. -In permission options select **Attach policies directly**. -In search field type **AmazonS3ReadOnlyAccess** and select the policy. + ![Create service account step 1](https://dqops.com/docs/images/data-sources/aws/aws-create-step-1.png){ loading=lazy; } -This policy provides read only access to all available buckets in the project. -If you like to limit access you need to create a custom policy and select it here. + In permission options, select **Attach policies directly**. + In search field, type **AmazonS3ReadOnlyAccess** and select the policy. -This is achievable by modifying the value the Resource field of a permission to specify an S3 path prefix that permission will work with. + This policy provides read-only access to all available buckets in the project. + If you would like to limit access, you need to create a custom policy and select it here. + You can achieve this by modifying the value in the Resource field of a permission to specify an S3 path prefix that permission will work with. + For example, "Resource": "arn:aws:s3:::<bucket_name_here>/*" + This allows access to all objects inside the bucket named <bucket_name_here>. -E.g: "Resource": "arn:aws:s3:::<bucket_name_here>/*" + ![Create service account step 2](https://dqops.com/docs/images/data-sources/aws/aws-create-step-2.png){ loading=lazy; } -Abowe allows access all object inside the bucket named <bucket_name_here>. + To finish creating the service account, click on the **Create user** button. -![Create service account step 2](https://dqops.com/docs/images/data-sources/aws/aws-create-step-2.png){ loading=lazy; } + ![Create service account step 3](https://dqops.com/docs/images/data-sources/aws/aws-create-step-3.png){ loading=lazy; } -Then click the **Create user** button. +2. **Generate Access Key** -![Create service account step 3](https://dqops.com/docs/images/data-sources/aws/aws-create-step-3.png){ loading=lazy; } + To Generate Access Key that will be used by DQOps to access files in your bucket, click on the name of the service account. -The service account has been created. -Now you can generate access key what will be used by DQOps to access files in your bucket. Click on the name of the service account. + ![Created service account](https://dqops.com/docs/images/data-sources/aws/aws-service-account-created.png){ loading=lazy; } -![Created service account](https://dqops.com/docs/images/data-sources/aws/aws-service-account-created.png){ loading=lazy; } + Navigate to the **Security credentials** tab, scroll down to the **Access keys** section, and click on the **Create access key** button. -Navigate to **Security credentials** tab, scroll down to **Access keys** section and click on the **Create access key** button. + ![Create access key](https://dqops.com/docs/images/data-sources/aws/create-access-key.png){ loading=lazy; } -![Create access key](https://dqops.com/docs/images/data-sources/aws/create-access-key.png){ loading=lazy; } + Select the **Application running outside AWS**, then **Next** -Select **Application running outside AWS**, then **Next** + ![Create access key step 1](https://dqops.com/docs/images/data-sources/aws/create-access-key-step-1.png){ loading=lazy; } -![Create access key step 1](https://dqops.com/docs/images/data-sources/aws/create-access-key-step-1.png){ loading=lazy; } + Add the description of your access key. -Put the description of your access key. + Click on the **Create access key**. -Click on **Create access key**. + ![Create access key step 2](https://dqops.com/docs/images/data-sources/aws/create-access-key-step-2.png){ loading=lazy; } -![Create access key step 2](https://dqops.com/docs/images/data-sources/aws/create-access-key-step-2.png){ loading=lazy; } + Click on the **Show** link to present the secret. -Click on **Show** link to present the secret. + ![Create access key step 3](https://dqops.com/docs/images/data-sources/aws/create-access-key-step-3.png){ loading=lazy; } -You have generated Access key for AWS S3. Copy Access key and Secret access key. - -![Create access key step 3](https://dqops.com/docs/images/data-sources/aws/create-access-key-step-3.png){ loading=lazy; } + You have generated an Access key and Secret for AWS S3, which can be used during the creation of the connection in DQOps. ### **Default Credential** -With DQOps, you can configure credentials to access AWS S3 directly in the platform. +In DQOps, you have the option to set up credentials to access AWS S3 directly through the platform. -Please note, that any credentials and secrets shared with the DQOps Cloud or DQOps SaaS instances are stored in the .credentials folder. -This folder also contains the default credentials files for AWS S3 (**AWS_default_config** and **AWS_default_credentials**). +Keep in mind that any credentials and secrets shared with the DQOps Cloud or DQOps SaaS instances are stored in the `.credentials folder`. +This folder also contains the default credentials files for AWS S3, namely **AWS_default_config** and **AWS_default_credentials**. ``` { .asc .annotate hl_lines="4-5" } $DQO_USER_HOME @@ -132,29 +95,27 @@ $DQO_USER_HOME └─... ``` -If you wish to use AWS authentication, the content of the files must be replaced with your aws_access_key_id, aws_secret_access_key and region. -You can find more details on how to [manage access keys for IAM users](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) in AWS documentation. +If you want to use AWS authentication, replace the content of the files with your aws_access_key_id, aws_secret_access_key, and region. +To learn more about [how to manage access keys for IAM users, you can refer to the AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html). !!! warning 'AWS system default credentials' - If you do not replace the content of the files, the default credentials will be loaded from system for AWS only. - + If you do not replace the content of the files, the system will load default credentials for AWS only. -To set the credential file for AWS in DQOps, follow steps: -1. Open the Configuration in menu. -2. Select Shared credentials from the tree view on the left. -3. Click the edit link on the “AWS_default_credentials” file. +To set the credential file for AWS in DQOps, follow these steps: -![Adding connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/aws-shared-credentials-ui.png) +1. Navigate to the **Configuration** section. +2. Select **Shared credentials** from the tree view on the left. +3. Click the **edit** link on the “AWS_default_credentials” file. -4. In the text area, edit the aws_access_key_id and aws_secret_access_key, replacing the placeholder text. + ![Adding connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/aws-shared-credentials-ui.png) -![Adding connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/edit-aws-shared-credential.png) +4. In the text area, replace the placeholder text with your aws_access_key_id and aws_secret_access_key. -5. Click the **Save** button, to save changes, go back to the main **Shared credentials** view. + ![Adding connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/edit-aws-shared-credential.png) -6. Edit the region in AWS_default_config file and save the file. +5. Click the **Save** button to save changes. !!! tip "Use the AWS system default credentials after filling in the shared credential" @@ -163,10 +124,41 @@ To set the credential file for AWS in DQOps, follow steps: you must manually delete the .credentials/AWS_default_config and .credentials/AWS_default_credentials files from the DQOps credentials. -## Set the Path +## Add a connection to AWS S3 using the user interface + +### **Navigate to the connection settings** + +DQOps uses the DuckDB connector to work with AWS S3 buckets. To navigate to the DuckDB connector: + +1. Go to the Data Sources section and click the **+ Add connection** button in the upper left corner. + + ![Adding connection](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection.png){ loading=lazy; width="1200px" } + +2. Select the DuckDB connection. + + ![Selecting DuckDB connection type](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection-duckdb.png){ loading=lazy; width="1200px" } + + +### **Fill in the connection settings** + +After navigating to the DuckDB connection settings, you will need to fill in its details. + +![Adding connection settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-json.png){ loading=lazy; width="1200px" } + +1. Enter a unique **Connection name**. +2. Change the **Files location** to **AWS S3** to work with files located in AWS S3 bucket. +3. Select the AWS authentication mode (IAM or Default credentials). +4. If you choose Default credentials, DQOps will use the AWS credentials stored within the platform. If you select IAM, provide the **Access Key ID**, **Secret Access Key**, and **Region**. +5. Select the appropriate **File Format** matching your data (CSV, JSON or Parquet). + +### **Set the Path for Import configuration** + +Define the location of your data in AWS S3. Here are some options, illustrated with an example directory structure: + +- **Specific file**: Enter the full path to a folder (e.g., **/my-bucket/clients_data/reports**). A selection of the file is available after saving the new connection. You cannot use a full file path. +- **Folder with similar files**: Provide the path to a directory containing folder with files with the same structure (e.g., **/my-bucket/clients_data**). A selection of the folder is available after saving the new connection configuration. +- **Hive-partitioned data**: Use the path to the data directory containing the directory with partitioned data and select the **Hive partitioning** checkbox under **Additional format options** (e.g., **/my-bucket/clients_data** with partitioning by date and market in the example). A selection of the **sales** directory is available after saving the new connection configuration. -Let assume you have directories with unstructured files, dataset divided into multiple files with the same structure - e.g. same header or partitioned data. -All mentioned cases are supported but differs in the configuration. ``` { .asc .annotate } my-bucket @@ -194,29 +186,22 @@ my-bucket └───... ``` -1. Connect to a specific file - e.g. annual_report_2022.csv by setting prefix to **/my_container/clients_data/reports**. A selection of the file is available after saving the new connection configuration. -2. Connect to all files in path - e.g. whole market_specification folder by setting prefix to **/my_container/clients_data**. A selection of the folder is available after saving the new connection configuration. -3. Connect to partitioned data - e.g. sales folder with partitioning by date and market - set prefix to **/my_container/clients_data** and select **Hive partitioning** checkbox from Additional format options. A selection of the **sales** folder is available after saving the new connection configuration. +1. Connect to a specific file - e.g. annual_report_2022.csv by setting prefix to **/my-bucket/clients_data/reports**. A selection of the file is available after saving the new connection configuration. +2. Connect to all files in path - e.g. whole market_specification directory by setting prefix to **/my-bucket/clients_data/**. A selection of the directory is available after saving the new connection configuration. +3. Connect to partitioned data - e.g. sales directory with partitioning by date and market - set prefix to **/my-bucket/clients_data** and select **Hive partitioning** checkbox from **Additional format** options. A selection of the **sales** directory is available after saving the new connection configuration. -You can connect to a specific file, e.g. annual_report_2022.csv (set prefix to **/usr/share/clients_data/reports**), -all files with the same structure in path, e.g. whole market_specification folder (set prefix to **/usr/share/clients_data**) -or hive style partitioned data, e.g. sales folder with partitioning by date and market - (set prefix to **/usr/share/clients_data** and select **Hive partitioning** checkbox from Additional format options). - -The path is a directory containing files. You cannot use a full file path. -The prefix cannot contain the name of a file. - -A selection of files or directories is available **after Saving the new connection**. +Click **Save** to establish the connection. DQOps will display a list of accessible schemas and files based on your path configuration. ## Import metadata using the user interface -When you add a new connection, it will appear in the tree view on the left, and you will be redirected to the Import Metadata screen. +After creating the connection, it will appear in the tree view on the left, and DQOps will automatically redirect you to the **Import Metadata** screen Now we can import files. -1. Import the selected virtual schemas by clicking on the **Import Tables** button next to the source schema name from which you want to import tables. +1. Import the selected virtual schemas by clicking on the **Import Tables** button next to the schema name. ![Importing schemas](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-schemas.png){ loading=lazy; width="1200px" } -2. Select the tables (folders with files of previously selected file format or just the files) you want to import or import all tables using the buttons in the upper right corner. +2. Select the specific tables (folders with files or just the files) you want to import or import all tables using the buttons in the top right corner. ![Importing tables](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-tables-csv.png){ loading=lazy; width="1200px" } @@ -228,9 +213,9 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-tables-advisor-csv.png){ loading=lazy; width="1200px" } -## Detailed parameters description of new connection +## Details of new connection - all parameters description -The form of the adding a new connection page provides additional fields not mentioned before. +The connection setup form includes the following fields: | File connection settings | Property name in YAML configuration file | Description | |---------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -246,21 +231,17 @@ The form of the adding a new connection page provides additional fields not ment | JDBC connection property | | Optional setting. DQOps supports using the JDBC driver to access DuckDB. | -The next configuration depends on the file format. You can choose from the three of them: - -- CSV -- JSON -- Parquet +The next configuration depends on the file format. You can choose from three options: **CSV**, **JSON**, or **Parquet**. ### Additional CSV format options -CSV file format properties are detected automatically based on a sample of the file data. -The default sample size is 20480 rows. +The properties of the **CSV** file format are automatically identified using a sample of the file data. The default sample size is 20480 rows. -In **case of invalid import** of the data, expand the **Additional CSV format options** panel with file format options by clicking on it in UI. +If the data import is unsuccessful, you can access additional CSV format options by clicking on the **Additional CSV format options** panel in the user interface. -The following properties can be configured for a very specific CSV format. +You can configure specific properties for a very specific CSV format. Here are the CSV format options, along with their +corresponding property names in the YAML configuration file and their descriptions: | Additional CSV format options | Property name in YAML configuration file | Description | |-------------------------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -285,12 +266,12 @@ The following properties can be configured for a very specific CSV format. ### Additional JSON format options -JSON file format properties are detected automatically based on a sample of the file data. -The default sample size is 20480 rows. +The properties of the **JSON** file format are automatically identified using a sample of the file data. The default sample size is 20480 rows. -In **case of invalid import** of the data, expand the **Additional JSON format options** panel with file format options by clicking on it in UI. +If the data import is unsuccessful, you can access additional CSV format options by clicking on the **Additional JSON format options** panel in the user interface. -The following properties can be configured for a very specific JSON format. +You can configure specific properties for a very specific JSON format. Here are the JSON format options, along with their +corresponding property names in the YAML configuration file and their descriptions: | Additional JSON format options | Property name in YAML configuration file | Description | |--------------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -311,9 +292,9 @@ The following properties can be configured for a very specific JSON format. ### Additional Parquet format options -Click on the **Additional Parquet format options** panel to configure the file format options. +You can access additional **Parquet** format options by clicking on the **Additional Parquet format options** panel in the user interface. -The Parquet's format properties can be configured with the following settings. +Here are the Parquet format options, along with their corresponding property names in the YAML configuration file and their descriptions: | Additional Parquet format options | Property name in YAML configuration file | Description | |-----------------------------------|--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -326,13 +307,13 @@ The Parquet's format properties can be configured with the following settings. ### Working with partitioned files -To work with partitioned files, you need to set the `hive-partition` parameter in CSV format settings. -The option can be found under the **Additional format options** panel. +To work with partitioned files, you need to set the `hive-partition` parameter in the format settings. +You can find this option under the **Additional format** options panel. -Hive partitioning divides a table into multiple files based on the catalog structure. -Each catalog level is associated with a column and the catalogs are named in the format of column_name=value. +Hive partitioning involves dividing a table into multiple files based on the catalog structure. +Each catalog level is associated with a column, and the catalogs are named in the format of column_name=value. -The partitions of the data set and types of columns are discovered automatically. +The partitions of the data set and types of columns are automatically discovered. ### Environment variables in parameters @@ -351,29 +332,31 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. Click the **Save** connection button when the test is successful otherwise, you can check the details of what went wrong. -## Register single file as table +## Register a single file as a table After creating a connection, you can register a single table. -To view the schema, expand the connection in the tree view on the left. +To view the schema, follow these steps: -Then, click on the three dots icon next to the schema name(1.) and select the **Add table** (2.) option. -This will open the **Add table** popup modal. +1. Expand the connection in the tree view on the left. +2. Click on the three dots icon next to the schema name. +3. Select the **Add table** option. This will open the **Add table** popup modal. -![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-1.png){ loading=lazy } + ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-1.png){ loading=lazy } -Enter the table name and the path absolute to the file. Save the new table configuration. +4. Enter the table name and the absolute path to the file. +5. Save the new table configuration. !!! tip "Use of the relative path" - If the schema specifies the folder path, use only the file name with extension instead of an absolute path. + If the schema specifies the folder path, use only the file name with an extension instead of an absolute path. !!! tip "Path in table name" @@ -381,23 +364,25 @@ Enter the table name and the path absolute to the file. Save the new table confi ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-2.png){ loading=lazy } -After saving the new table configuration, the new table will be present under the schema. -You can view the list of columns by clicking on "Columns" under the table in the three view on the left. +After saving the new table configuration, the table will appear under the specified schema. +To expand the list of columns, click on the **Columns** under the table in the three-view on the left. -You can verify the import tables job in the notification panel on the right corner. +You can check the status of the table import job in the notification panel located in the top right corner. ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-3.png){ loading=lazy } -If the job completes successfully, the created table will be imported and ready to use. +If the job is successful, the table will be created, imported, and ready to use. ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-4.png){ loading=lazy; width="1200px" } ## Add connection using DQOps Shell -The following examples use parquet file format. To connect to csv or json, put the expected file format instead of "parquet" in the example commands. +The following examples demonstrate how to import Parquet file format to Google Cloud Storage buckets. DQOps uses the DuckDB +connector to work with Google Cloud Storage buckets. +To import CSV or JSON files, replace `parquet` with the appropriate file format in the example commands. -To add a connection run the following command in DQOps Shell. +To add a connection, execute the following command in DQOps Shell. ``` dqo> connection add @@ -454,7 +439,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -465,7 +450,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -497,6 +482,6 @@ YAML file format. ## Next steps - Learn about more advanced importing when [working with files](../working-with-dqo/working-with-files.md) -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). -- The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. +- The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/azure.md b/docs/data-sources/azure.md index ba2901776d..f35fd5b7de 100644 --- a/docs/data-sources/azure.md +++ b/docs/data-sources/azure.md @@ -2,160 +2,126 @@ title: How to activate data observability for Azure --- -# How to activate data observability for Azure +# How to activate data observability for Azure storage -This guide shows how to activate data observability for Azure by connecting DQOps. -The example will use the Azure Blob Storage for storing data. +This guide shows how to enable data observability for data stored in Azure Blob Storage using DQOps. To seamlessly connect to Azure Blob Storage, +DQOps uses the DuckDB connector. ## Prerequisites -- Data in CSV, JSON or Parquet format (compressed files allowed), located in a Storage Container in your Storage Account. -- [DQOps installation](../getting-started/installation.md) +- Data in CSV, JSON, or Parquet format (compressed files allowed), stored in Azure Storage. +- [Installed DQOps](../getting-started/installation.md). +- Access permission and authentication to Azure Blob Storage. -## Add connection to Azure using the user interface +### **Choose the Azure authentication mode** -### **Navigate to the connection settings** - -To navigate to the Azure connection settings: - -1. Go to the Data Sources section and click the **+ Add connection** button in the upper left corner. - - ![Adding connection](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection.png){ loading=lazy; width="1200px" } - -2. Select the DuckDB connection. - - ![Selecting DuckDB connection type](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection-duckdb.png){ loading=lazy; width="1200px" } - - -### **Fill in the connection settings** - -After navigating to the Azure connection settings, you will need to fill in its details. - -![Adding connection settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-json.png){ loading=lazy; width="1200px" } - -Fill the **Connection name** any name you want. - -Change the **Files location** to **Azure Blob Storage**, to work with files located in Azure. - -Select the **File Format** suitable to your files located in Azure. You can choose from CSV, JSON or Parquet file format. - -To complete the configuration you need to set the: - -- **Azure authentication mode** -- **Path** - - -## Choose the Azure authentication mode - -DQOps requires permissions to establish the connection to the Azure storage. - -You can choose from a variety of authentication methods that will allow to connect to your data: +To connect DQOps to Azure Blob Storage, you need to set the authentication method. +DQOps supports the following authentication methods for connecting to Azure Blob Storage: - Connection String - Credential Chain - Service Principal - Default Credential -Below you can find how to get credentials for each of the authentication methods. +Below you can find how to get credentials for each authentication methods. ### **Connection String** -The connection string is created on the Storage Account level. -It allows access to all files in each of the Storage Containers created in the Storage Account. +The connection string is generated at the Storage Account level. It grants access to all files in each of the Storage Containers created in the Storage Account. -You can find the connection string in the Storage Account details. -Open the Storage Account menu section in Azure Portal. Select the **Security + networking**, then **Access keys**. +You can locate the connection string in the **Storage Account** details. +Open the Storage Account menu section in the Azure Portal, then select **Security + networking** > **Access keys**. ![Connection string](https://dqops.com/docs/images/data-sources/azure/connection-string.png){ loading=lazy; } -### **Credential Chain** +### **Credential Chain** -The credential chain uses environment variables and accounts stored locally used for applications running locally. -That is why it will work on local DQOps instance only. +The credential chain utilizes environment variables and locally stored accounts for applications running on local machines. +Hence, it will only work on a local DQOps instance. -You can sign in interactively to Azure with use of Azure CLI command: **az login** +To sign in interactively to Azure, use the Azure CLI command: **az login**. After successfully running the command, +**restart the DQOps** process to enable it to load the updated account credentials. -After you succeed with the command **restart the DQOps** process allowing it to load the fresh account credentials. +### **Service Principal** -### **Service Principal** +Service Principal is a recommended authentication method. It provides an identity specifically for applications or services to access Azure resources. -This is the recommended authentication method. +To set up Service Principal authentication create a service account, generate a client secret, and add role assignment to the container. -The service principal is an impersonalized identity used specifically for a service with a proper permission. +1. **Create Service account** in Azure. -This method requires creating a service account, generating a secret and adding role assignment to the container. + Open **Enterprise applications** and click the **New application**. -Start with creating a service account in Azure. -Open **Enterprise applications** and click the **New application**. + ![New enterprise application](https://dqops.com/docs/images/data-sources/azure/new-enterprise-application.png){ loading=lazy; } -![New enterprise application](https://dqops.com/docs/images/data-sources/azure/new-enterprise-application.png){ loading=lazy; } + Then **Create your own application**. -Then **Create your own application**. + ![New your own enterprise application](https://dqops.com/docs/images/data-sources/azure/new-enterprise-application-your-own.png){ loading=lazy; } -![New your own enterprise application](https://dqops.com/docs/images/data-sources/azure/new-enterprise-application-your-own.png){ loading=lazy; } + Fill in the name with your service account and create it. -Fill the name with your service account and create it. + ![Create your own application](https://dqops.com/docs/images/data-sources/azure/on-right-create-your-own-application.png){ loading=lazy; } -![Create your own application](https://dqops.com/docs/images/data-sources/azure/on-right-create-your-own-application.png){ loading=lazy; } + Now the service account is ready, but it does not have any credentials available to be used. -Now the service account is ready but it does not have any credentials available to be used. +2. Generate Client Secret + + Open the **App registrations** in Azure Entra ID. + Select **All applications**, then select the name of the service account. -To create credentials open the **App registrations** in Azure Entra ID. -Select **All applications**, then select the name of the service account. + ![App registration](https://dqops.com/docs/images/data-sources/azure/app-registrations.png){ loading=lazy; } -![App registration](https://dqops.com/docs/images/data-sources/azure/app-registrations.png){ loading=lazy; } + Then navigate to **Certificates & secrets** and click the **New client secret** -Then navigate to **Certificates & secrets** and click the **New client secret** + ![App registration](https://dqops.com/docs/images/data-sources/azure/create-new-client-secret.png){ loading=lazy; } -![App registration](https://dqops.com/docs/images/data-sources/azure/create-new-client-secret.png){ loading=lazy; } + Then, fill in the name of a new client secret and create it. -Then fill the name of a new client secret and create it. + Now the secret is ready. Save the **Value** of the key, which is your **Client Secret**. -Now the secret is ready. Save the **Value** of the key, which is your **Client Secret**. + ![App registration](https://dqops.com/docs/images/data-sources/azure/client-secret.png){ loading=lazy; } -![App registration](https://dqops.com/docs/images/data-sources/azure/client-secret.png){ loading=lazy; } +3. Assign Roles. -The last thing to be done is to add the permission of your service account to the storage account. + The last thing to be done is to add the permission of your service account to the storage account. -Open the container you will work with and select the **Access Control (IAM)**. -Click on **Add** and select the **Add role assignment**. + Open the container you will work with and select the **Access Control (IAM)**. + Click on **Add** and select the **Add role assignment**. -![App registration](https://dqops.com/docs/images/data-sources/azure/add-iam.png){ loading=lazy; } + ![App registration](https://dqops.com/docs/images/data-sources/azure/add-iam.png){ loading=lazy; } -In Role tab, search for **Storage Blob Data Reader** and click on the present role below. -The role adds read permissions to the Storage Container. + In the Role tab, search for **Storage Blob Data Reader** and click on the present role below. + The role adds read permissions to the Storage Container. -![App registration](https://dqops.com/docs/images/data-sources/azure/add-iam-role.png){ loading=lazy; } + ![App registration](https://dqops.com/docs/images/data-sources/azure/add-iam-role.png){ loading=lazy; } -In Members tab, click on the **Select members** and type the name of the service account, then click Enter. + In the Members tab, click on the **Select members** and type the name of the service account, then click Enter. -The name of the service account will appear when the full name is typed. + The name of the service account will appear when the full name is typed. -Select it and click Select. + Select it and click Select. -![App registration](https://dqops.com/docs/images/data-sources/azure/add-iam-member.png){ loading=lazy; } + ![App registration](https://dqops.com/docs/images/data-sources/azure/add-iam-member.png){ loading=lazy; } -To add a connection in DQOps with use of Service Principal authentication mode you need the following: + Provide these details when configuring the Azure Blob Storage connection in DQOps. -- Storage Account Name -- Tenant ID -- Client ID -- Client Secret + - **Tenant ID** + - **Client ID**: The application (client) ID of your registered application. + - **Client Secret**: The secret you generated. + - **Storage Account Name**: The name of your Azure Storage account. -The **Client Secret** you saved. + Tenant ID and Client ID are available in the App registrations Overview section of the Azure Entra ID. -Tenant ID and Client ID are available in the App registrations Overview section of the Azure Entra ID. + ![App registration](https://dqops.com/docs/images/data-sources/azure/credentials.png){ loading=lazy; } -![App registration](https://dqops.com/docs/images/data-sources/azure/credentials.png){ loading=lazy; } +### **Default Credential** -### **Default Credential** +In DQOps, you have the option to set up credentials for accessing Azure Blob Storage directly through the platform. -With DQOps, you can configure credentials to access Azure Blob Storage directly in the platform. - -Please note, that any credentials and secrets shared with the DQOps Cloud or DQOps SaaS instances are stored in the .credentials folder. +Keep in mind that any credentials and secrets shared with the DQOps Cloud or DQOps SaaS instances are stored in the .credentials folder. This folder also contains the default credentials files for Azure Blob Storage (**Azure_default_credentials**). ``` { .asc .annotate hl_lines="4" } @@ -166,27 +132,61 @@ $DQO_USER_HOME └─... ``` -If you wish to use Azure authentication, you need service principal credentials that must be replaced in Azure file content. +If you want to use Azure authentication, you need service principal credentials that must be replaced in Azure file content. + +To set the credential file for Azure in DQOps, follow these steps: -To set the credential file for Azure in DQOps, follow steps: +1. Navigate to the **Configuration** section. +2. Select **Shared credentials** from the tree view on the left. +3. Click the **edit** link on the “Azure_default_credentials” file. -1. Open the Configuration in menu. -2. Select Shared credentials from the tree view on the left. -3. Click the edit link on the “Azure_default_credentials” file. + ![Adding connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/azure-shared-credentials-ui2.png) -![Adding connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/azure-shared-credentials-ui2.png) +4. In the text area, replace the placeholder text with your tenant_id, client_id, client_secret and account_name. -4. In the text area, edit the tenant_id, client_id, client_secret and account_name, replacing the placeholder text. + ![Edit connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/edit-azure-shared-credential2.png) -![Edit connection settings - environmental variables](https://dqops.com/docs/images/working-with-dqo/adding-connections/credentials/edit-azure-shared-credential2.png) +5. Click the **Save** button, to save changes. -5. Click the **Save** button, to save changes, go back to the main **Shared credentials** view. +## Add a connection to Azure Blob Storage using the user interface + +### **Navigate to the connection settings** -## Set the Path +DQOps uses the DuckDB connector to work with Azure Blob Storage. To navigate to the DuckDB connector: + +1. Go to the Data Sources section and click the **+ Add connection** button in the upper left corner. + + ![Adding connection](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection.png){ loading=lazy; width="1200px" } + +2. Select the DuckDB connection. + + ![Selecting DuckDB connection type](https://dqops.com/docs/images/working-with-dqo/adding-connections/adding-connection-duckdb.png){ loading=lazy; width="1200px" } + + +### **Fill in the connection settings** + +After navigating to the DuckDB connection settings, you will need to fill in its details. + +![Adding connection settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-json.png){ loading=lazy; width="1200px" } + +1. Enter a unique **Connection name**. +2. Change the **Files location** to **Azure Blob Storage** to work with files located in Azure storage. +3. Select the Azure Blob Storage authentication mode and choose the appropriate method from the available options: + - **Connection String**: Directly input your Azure Storage connection string. + - **Credential Chain**: Provide storage account name and utilize the default Azure credentials chain. + - **Service Principal**: Provide the **Tenant ID**, **Client ID**, **Client Secret**, and **Storage Account name**. + - **Default credentials**: Allow DQOps to use Azure credentials stored within the platform. +4. Select the appropriate **File Format** matching your data (CSV, JSON or Parquet). + +### **Set the Path for Import configuration** + +Define the location of your data in Azure Blob Storage. Here are some options, illustrated with an example directory structure: + +- **Specific file**: Enter the full path to a folder (e.g., **/my-bucket/clients_data/reports**). A selection of the file is available after saving the new connection. You cannot use a full file path. +- **Folder with similar files**: Provide the path to a directory containing folder with files with the same structure (e.g., **/my-bucket/clients_data**). A selection of the folder is available after saving the new connection configuration. +- **Hive-partitioned data**: Use the path to the data directory containing the directory with partitioned data and select the **Hive partitioning** checkbox under **Additional format options** (e.g., **/my-bucket/clients_data** with partitioning by date and market in the example). A selection of the **sales** directory is available after saving the new connection configuration. -Let assume you have directories with unstructured files, dataset divided into multiple files with the same structure - e.g. same header or partitioned data. -All mentioned cases are supported but differs in the configuration. ``` { .asc .annotate } my-container @@ -214,29 +214,22 @@ my-container └───... ``` -1. Connect to a specific file - e.g. annual_report_2022.csv by setting prefix to **/my_container/clients_data/reports**. A selection of the file is available after saving the new connection configuration. -2. Connect to all files in path - e.g. whole market_specification folder by setting prefix to **/my_container/clients_data**. A selection of the folder is available after saving the new connection configuration. -3. Connect to partitioned data - e.g. sales folder with partitioning by date and market - set prefix to **/my_container/clients_data** and select **Hive partitioning** checkbox from Additional format options. A selection of the **sales** folder is available after saving the new connection configuration. +1. Connect to a specific file - e.g. annual_report_2022.csv by setting prefix to **/my-bucket/clients_data/reports**. A selection of the file is available after saving the new connection configuration. +2. Connect to all files in path - e.g. whole market_specification directory by setting prefix to **/my-bucket/clients_data/**. A selection of the directory is available after saving the new connection configuration. +3. Connect to partitioned data - e.g. sales directory with partitioning by date and market - set prefix to **/my-bucket/clients_data** and select **Hive partitioning** checkbox from **Additional format** options. A selection of the **sales** directory is available after saving the new connection configuration. -You can connect to a specific file, e.g. annual_report_2022.csv (set prefix to **/usr/share/clients_data/reports**), -all files with the same structure in path, e.g. whole market_specification folder (set prefix to **/usr/share/clients_data**) -or hive style partitioned data, e.g. sales folder with partitioning by date and market - (set prefix to **/usr/share/clients_data** and select **Hive partitioning** checkbox from Additional format options). +Click **Save** to establish the connection. DQOps will display a list of accessible schemas and files based on your path configuration. -The path is a directory containing files. You cannot use a full file path. -The prefix cannot contain the name of a file. +# Import metadata using the user interface -A selection of files or directories is available **after Saving the new connection**. - -## Import metadata using the user interface - -When you add a new connection, it will appear in the tree view on the left, and you will be redirected to the Import Metadata screen. +After creating the connection, it will appear in the tree view on the left, and DQOps will automatically redirect you to the **Import Metadata** screen Now we can import files. -1. Import the selected virtual schemas by clicking on the **Import Tables** button next to the source schema name from which you want to import tables. +1. Import the selected virtual schemas by clicking on the **Import Tables** button next to the schema name. ![Importing schemas](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-schemas.png){ loading=lazy; width="1200px" } -2. Select the tables (folders with files of previously selected file format or just the files) you want to import or import all tables using the buttons in the upper right corner. +2. Select the specific tables (folders with files or just the files) you want to import or import all tables using the buttons in the top right corner. ![Importing tables](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-tables-csv.png){ loading=lazy; width="1200px" } @@ -248,9 +241,9 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-tables-advisor-csv.png){ loading=lazy; width="1200px" } -## Detailed parameters description of new connection +## Details of new connection - all parameters description -The form of the adding a new connection page provides additional fields not mentioned before. +The connection setup form includes the following fields: | File connection settings | Property name in YAML configuration file | Description | |---------------------------|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -269,21 +262,17 @@ The form of the adding a new connection page provides additional fields not ment | JDBC connection property | | Optional setting. DQOps supports using the JDBC driver to access DuckDB. | -The next configuration depends on the file format. You can choose from the three of them: - -- CSV -- JSON -- Parquet +The next configuration depends on the file format. You can choose from three options: **CSV**, **JSON**, or **Parquet**. ### Additional CSV format options -CSV file format properties are detected automatically based on a sample of the file data. -The default sample size is 20480 rows. +The properties of the **CSV** file format are automatically identified using a sample of the file data. The default sample size is 20480 rows. -In **case of invalid import** of the data, expand the **Additional CSV format options** panel with file format options by clicking on it in UI. +If the data import is unsuccessful, you can access additional CSV format options by clicking on the **Additional CSV format options** panel in the user interface. -The following properties can be configured for a very specific CSV format. +You can configure specific properties for a very specific CSV format. Here are the CSV format options, along with their +corresponding property names in the YAML configuration file and their descriptions: | Additional CSV format options | Property name in YAML configuration file | Description | |-------------------------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -308,12 +297,12 @@ The following properties can be configured for a very specific CSV format. ### Additional JSON format options -JSON file format properties are detected automatically based on a sample of the file data. -The default sample size is 20480 rows. +The properties of the **JSON** file format are automatically identified using a sample of the file data. The default sample size is 20480 rows. -In **case of invalid import** of the data, expand the **Additional JSON format options** panel with file format options by clicking on it in UI. +If the data import is unsuccessful, you can access additional CSV format options by clicking on the **Additional JSON format options** panel in the user interface. -The following properties can be configured for a very specific JSON format. +You can configure specific properties for a very specific JSON format. Here are the JSON format options, along with their +corresponding property names in the YAML configuration file and their descriptions: | Additional JSON format options | Property name in YAML configuration file | Description | |--------------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -334,9 +323,9 @@ The following properties can be configured for a very specific JSON format. ### Additional Parquet format options -Click on the **Additional Parquet format options** panel to configure the file format options. +You can access additional **Parquet** format options by clicking on the **Additional Parquet format options** panel in the user interface. -The Parquet's format properties can be configured with the following settings. +Here are the Parquet format options, along with their corresponding property names in the YAML configuration file and their descriptions: | Additional Parquet format options | Property name in YAML configuration file | Description | |-----------------------------------|--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -349,13 +338,13 @@ The Parquet's format properties can be configured with the following settings. ### Working with partitioned files -To work with partitioned files, you need to set the `hive-partition` parameter in CSV format settings. -The option can be found under the **Additional format options** panel. +To work with partitioned files, you need to set the `hive-partition` parameter in the format settings. +You can find this option under the **Additional format** options panel. -Hive partitioning divides a table into multiple files based on the catalog structure. -Each catalog level is associated with a column and the catalogs are named in the format of column_name=value. +Hive partitioning involves dividing a table into multiple files based on the catalog structure. +Each catalog level is associated with a column, and the catalogs are named in the format of column_name=value. -The partitions of the data set and types of columns are discovered automatically. +The partitions of the data set and types of columns are automatically discovered. ### Environment variables in parameters @@ -374,29 +363,31 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. Click the **Save** connection button when the test is successful otherwise, you can check the details of what went wrong. -## Register single file as table +## Register a single file as a table After creating a connection, you can register a single table. -To view the schema, expand the connection in the tree view on the left. +To view the schema, follow these steps: -Then, click on the three dots icon next to the schema name(1.) and select the **Add table** (2.) option. -This will open the **Add table** popup modal. +1. Expand the connection in the tree view on the left. +2. Click on the three dots icon next to the schema name. +3. Select the **Add table** option. This will open the **Add table** popup modal. -![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-1.png){ loading=lazy } + ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-1.png){ loading=lazy } -Enter the table name and the path absolute to the file. Save the new table configuration. +4. Enter the table name and the absolute path to the file. +5. Save the new table configuration. !!! tip "Use of the relative path" - If the schema specifies the folder path, use only the file name with extension instead of an absolute path. + If the schema specifies the folder path, use only the file name with an extension instead of an absolute path. !!! tip "Path in table name" @@ -404,23 +395,25 @@ Enter the table name and the path absolute to the file. Save the new table confi ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-2.png){ loading=lazy } -After saving the new table configuration, the new table will be present under the schema. -You can view the list of columns by clicking on "Columns" under the table in the three view on the left. +After saving the new table configuration, the table will appear under the specified schema. +To expand the list of columns, click on the **Columns** under the table in the three-view on the left. -You can verify the import tables job in the notification panel on the right corner. +You can check the status of the table import job in the notification panel located in the top right corner. ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-3.png){ loading=lazy } -If the job completes successfully, the created table will be imported and ready to use. +If the job is successful, the table will be created, imported, and ready to use. ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-4.png){ loading=lazy; width="1200px" } ## Add connection using DQOps Shell -The following examples use parquet file format. To connect to csv or json, put the expected file format instead of "parquet" in the example commands. +The following examples demonstrate how to import Parquet file format to Google Cloud Storage buckets. DQOps uses the DuckDB +connector to work with Azure Blob Storage. +To import CSV or JSON files, replace `parquet` with the appropriate file format in the example commands. -To add a connection run the following command in DQOps Shell. +To add a connection, execute the following command in DQOps Shell. ``` dqo> connection add @@ -477,7 +470,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -488,7 +481,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -520,6 +513,6 @@ YAML file format. ## Next steps - Learn about more advanced importing when [working with files](../working-with-dqo/working-with-files.md) -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). -- The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. +- The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/bigquery.md b/docs/data-sources/bigquery.md index 1d0eeb5733..8fb670c974 100644 --- a/docs/data-sources/bigquery.md +++ b/docs/data-sources/bigquery.md @@ -17,7 +17,7 @@ To add BigQuery data source connection to DQOps you need the following: - A service account key in JSON format for JSON key authentication. For details refer to [Create and delete service account keys](https://cloud.google.com/iam/docs/keys-create-delete) - A working [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) if you want to use [Google Application Credentials](#using-google-application-credentials-authentication) authentication -## Add BigQuery connection using the user interface +## Add a BigQuery connection using the user interface ### **Navigate to the connection settings** @@ -78,7 +78,7 @@ the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png) -## Add BigQuery connection using DQOps Shell +## Add a BigQuery connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -131,7 +131,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -141,7 +141,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -235,6 +235,6 @@ To set the credential file in DQOps, follow these steps: ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. diff --git a/docs/data-sources/csv.md b/docs/data-sources/csv.md index 2931512df9..4a40fa75b3 100644 --- a/docs/data-sources/csv.md +++ b/docs/data-sources/csv.md @@ -17,7 +17,7 @@ Additional configuration is required **only when using remote storage** (AWS S3, When using remote cloud storage, make sure your account has access to the remote directory containing CSV files. The permissions granted should allow you to list the files and directories, as well as read the contents of the files. -## Add connection to CSV files using the user interface +## Add a connection to CSV files using the user interface ### **Navigate to the connection settings** @@ -147,7 +147,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -190,7 +190,7 @@ Enter the table name and the path absolute to the file. Save the new table confi !!! tip "Use of the relative path" - If the schema specifies the folder path, use only the file name with extension instead of an absolute path. + If the schema specifies the folder path, use only the file name with an extension instead of an absolute path. !!! tip "Path in table name" @@ -209,7 +209,7 @@ If the job completes successfully, the created table will be imported and ready ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-4.png){ loading=lazy; width="1200px" } -## Add CSV connection using DQOps Shell +## Add a CSV connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -273,7 +273,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -283,7 +283,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -368,6 +368,6 @@ To set the credential file for AWS in DQOps, follow steps: ## Next steps - Learn about more advanced importing when [working with files](../working-with-dqo/working-with-files.md) -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. diff --git a/docs/data-sources/databricks.md b/docs/data-sources/databricks.md index cda2345453..64c736ea5b 100644 --- a/docs/data-sources/databricks.md +++ b/docs/data-sources/databricks.md @@ -15,7 +15,7 @@ To add Databricks data source connection to DQOps you need a Databricks SQL Ware It is also recommended to use an access token to connect an instance, so a permission to generate access token or a possession of a previously generated token is necessary. -## Add Databricks connection using the user interface +## Add a Databricks connection using the user interface ### **Navigate to the connection settings** @@ -62,7 +62,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -91,7 +91,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Databricks connection using DQOps Shell +## Add a Databricks connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -142,7 +142,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -152,7 +152,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -325,6 +325,6 @@ The Catalog should be filled with **hive_metastore** to access the catalog with ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/duckdb.md b/docs/data-sources/duckdb.md index 5941099476..b5e0266554 100644 --- a/docs/data-sources/duckdb.md +++ b/docs/data-sources/duckdb.md @@ -17,7 +17,7 @@ Additional configuration is required **only when using remote storage** (AWS S3, When using remote cloud storage, make sure your account has access to the remote directory containing CSV, JSON, or Parquet files. The permissions granted should allow you to list the files and directories, as well as read the contents of the files. -## Add connection to the files using the user interface +## Add a connection to the files using the user interface ### **Navigate to the connection settings** diff --git a/docs/data-sources/google.md b/docs/data-sources/google.md index 248100ec4e..9aea921992 100644 --- a/docs/data-sources/google.md +++ b/docs/data-sources/google.md @@ -1,22 +1,36 @@ --- -title: How to activate data observability for GCP +title: How to activate data observability for Google Cloud Storage --- -# How to activate data observability for Google +# How to activate data observability for Google Cloud Storage -This guide shows how to activate data observability for Google by connecting DQOps. -The example will use the Google Cloud Storage for storing data. +This guide shows how to enable data observability for data stored in a Google Cloud Storage buckets using DQOps. To seamlessly connect to Google +Cloud Storage buckets, DQOps uses the DuckDB connector. ## Prerequisites -- Data in CSV, JSON or Parquet format (compressed files allowed), located in a Bucket. -- [DQOps installation](../getting-started/installation.md) +- Data in CSV, JSON, or Parquet format (compressed files allowed), stored in a Google Cloud Storage Bucket. +- [Installed DQOps](../getting-started/installation.md). +- Access permission and credentials to Google Cloud Storage (**Access Key** and **Secret**). -## Add connection to Google using the user interface +### **Generate Credentials** + +To connect DQOps to Google Cloud Storage, you will need access permissions. This connection is established using the Interoperability API. + +To obtain the **Access Key** and **Secret**, follow these steps: + +1. In the Google Cloud Platform console, click the Cloud Storage tile. +2. Go to the **Settings** in the right navigation panel. +3. On the **Interoperability** tab, under **Access keys for user** account, click CREATE A KEY. + +![Interoperability](https://dqops.com/docs/images/data-sources/google/google-interoperability.png){ loading=lazy; } + + +## Add a connection to Google Cloud Storage using the user interface ### **Navigate to the connection settings** -To navigate to the Google connection settings: +DQOps uses the DuckDB connector to work with Google Cloud Storage buckets. To navigate to the DuckDB connector: 1. Go to the Data Sources section and click the **+ Add connection** button in the upper left corner. @@ -29,40 +43,26 @@ To navigate to the Google connection settings: ### **Fill in the connection settings** -After navigating to the Google connection settings, you will need to fill in its details. +After navigating to the DuckDB connection settings, you will need to fill in its details. ![Adding connection settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-json.png){ loading=lazy; width="1200px" } -Fill the **Connection name** any name you want. - -Change the **Files location** to **Google Cloud Storage**, to work with files located in Google. - -Select the **File Format** suitable to your files located in Google. You can choose from CSV, JSON or Parquet file format. - -To complete the configuration you need to set the: +1. Enter a unique **Connection name**. +2. Change the **Files location** to **Google Cloud Storage**, to work with files located in Google bucket. +3. Fill in **Access Key** and **Secret** +4. Select the appropriate **File Format** matching your data (CSV, JSON or Parquet). -- **Credentials** -- **Path** +To complete the configuration, you need to set the **Path**. -## Generate Credentials +### **Set the Path for Import configuration** -DQOps requires permissions to establish the connection to the Google Cloud Storage. +Define the location of your data in Google Cloud Storage. Here are some options, illustrated with an example directory structure: -Connection to GCS is performed with use of the Interoperability API. +- **Specific file**: Enter the full path to a folder (e.g., **/my-bucket/clients_data/reports**). A selection of the file is available after saving the new connection. You cannot use a full file path. +- **Folder with similar files**: Provide the path to a directory containing folder with files with the same structure (e.g., **/my-bucket/clients_data**). A selection of the folder is available after saving the new connection configuration. +- **Hive-partitioned data**: Use the path to the data directory containing the directory with partitioned data and select the **Hive partitioning** checkbox under **Additional format options** (e.g., **/my-bucket/clients_data** with partitioning by date and market in the example). A selection of the **sales** directory is available after saving the new connection configuration. -To generate **Access Key** and **Secret**, open Cloud Storage in Google. - -Go to the settings in the right navigation panel and open the Interoperability tab. - -Down on the page you can generate a new key. - -![Interoperability](https://dqops.com/docs/images/data-sources/google/google-interoperability.png){ loading=lazy; } - -## Set the Path - -Let assume you have directories with unstructured files, dataset divided into multiple files with the same structure - e.g. same header or partitioned data. -All mentioned cases are supported but differs in the configuration. ``` { .asc .annotate } my-bucket @@ -90,29 +90,22 @@ my-bucket └───... ``` -1. Connect to a specific file - e.g. annual_report_2022.csv by setting prefix to **/my_container/clients_data/reports**. A selection of the file is available after saving the new connection configuration. -2. Connect to all files in path - e.g. whole market_specification folder by setting prefix to **/my_container/clients_data**. A selection of the folder is available after saving the new connection configuration. -3. Connect to partitioned data - e.g. sales folder with partitioning by date and market - set prefix to **/my_container/clients_data** and select **Hive partitioning** checkbox from Additional format options. A selection of the **sales** folder is available after saving the new connection configuration. - -You can connect to a specific file, e.g. annual_report_2022.csv (set prefix to **/usr/share/clients_data/reports**), -all files with the same structure in path, e.g. whole market_specification folder (set prefix to **/usr/share/clients_data**) -or hive style partitioned data, e.g. sales folder with partitioning by date and market - (set prefix to **/usr/share/clients_data** and select **Hive partitioning** checkbox from Additional format options). +1. Connect to a specific file - e.g. annual_report_2022.csv by setting prefix to **/my-bucket/clients_data/reports**. A selection of the file is available after saving the new connection configuration. +2. Connect to all files in path - e.g. whole market_specification directory by setting prefix to **/my-bucket/clients_data/**. A selection of the directory is available after saving the new connection configuration. +3. Connect to partitioned data - e.g. sales directory with partitioning by date and market - set prefix to **/my-bucket/clients_data** and select **Hive partitioning** checkbox from **Additional format** options. A selection of the **sales** directory is available after saving the new connection configuration. -The path is a directory containing files. You cannot use a full file path. -The prefix cannot contain the name of a file. - -A selection of files or directories is available **after Saving the new connection**. +Click **Save** to establish the connection. DQOps will display a list of accessible schemas and files based on your path configuration. ## Import metadata using the user interface -When you add a new connection, it will appear in the tree view on the left, and you will be redirected to the Import Metadata screen. +After creating the connection, it will appear in the tree view on the left, and DQOps will automatically redirect you to the **Import Metadata** screen Now we can import files. -1. Import the selected virtual schemas by clicking on the **Import Tables** button next to the source schema name from which you want to import tables. +1. Import the selected virtual schemas by clicking on the **Import Tables** button next to the schema name. ![Importing schemas](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-schemas.png){ loading=lazy; width="1200px" } -2. Select the tables (folders with files of previously selected file format or just the files) you want to import or import all tables using the buttons in the upper right corner. +2. Select the specific tables (folders with files or just the files) you want to import or import all tables using the buttons in the top right corner. ![Importing tables](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-tables-csv.png){ loading=lazy; width="1200px" } @@ -124,9 +117,9 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/importing-tables-advisor-csv.png){ loading=lazy; width="1200px" } -## Detailed parameters description of new connection +## Details of new connection - all parameters description -The form of the adding a new connection page provides additional fields not mentioned before. +The connection setup form includes the following fields: | File connection settings | Property name in YAML configuration file | Description | |---------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -141,21 +134,17 @@ The form of the adding a new connection page provides additional fields not ment | JDBC connection property | | Optional setting. DQOps supports using the JDBC driver to access DuckDB. | -The next configuration depends on the file format. You can choose from the three of them: - -- CSV -- JSON -- Parquet +The next configuration depends on the file format. You can choose from three options: **CSV**, **JSON**, or **Parquet**. ### Additional CSV format options -CSV file format properties are detected automatically based on a sample of the file data. -The default sample size is 20480 rows. +The properties of the **CSV** file format are automatically identified using a sample of the file data. The default sample size is 20480 rows. -In **case of invalid import** of the data, expand the **Additional CSV format options** panel with file format options by clicking on it in UI. +If the data import is unsuccessful, you can access additional CSV format options by clicking on the **Additional CSV format options** panel in the user interface. -The following properties can be configured for a very specific CSV format. +You can configure specific properties for a very specific CSV format. Here are the CSV format options, along with their +corresponding property names in the YAML configuration file and their descriptions: | Additional CSV format options | Property name in YAML configuration file | Description | |-------------------------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -180,12 +169,12 @@ The following properties can be configured for a very specific CSV format. ### Additional JSON format options -JSON file format properties are detected automatically based on a sample of the file data. -The default sample size is 20480 rows. +The properties of the **JSON** file format are automatically identified using a sample of the file data. The default sample size is 20480 rows. -In **case of invalid import** of the data, expand the **Additional JSON format options** panel with file format options by clicking on it in UI. +If the data import is unsuccessful, you can access additional CSV format options by clicking on the **Additional JSON format options** panel in the user interface. -The following properties can be configured for a very specific JSON format. +You can configure specific properties for a very specific JSON format. Here are the JSON format options, along with their +corresponding property names in the YAML configuration file and their descriptions: | Additional JSON format options | Property name in YAML configuration file | Description | |--------------------------------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -206,9 +195,9 @@ The following properties can be configured for a very specific JSON format. ### Additional Parquet format options -Click on the **Additional Parquet format options** panel to configure the file format options. +You can access additional **Parquet** format options by clicking on the **Additional Parquet format options** panel in the user interface. -The Parquet's format properties can be configured with the following settings. +Here are the Parquet format options, along with their corresponding property names in the YAML configuration file and their descriptions: | Additional Parquet format options | Property name in YAML configuration file | Description | |-----------------------------------|--------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -221,13 +210,13 @@ The Parquet's format properties can be configured with the following settings. ### Working with partitioned files -To work with partitioned files, you need to set the `hive-partition` parameter in CSV format settings. -The option can be found under the **Additional format options** panel. +To work with partitioned files, you need to set the `hive-partition` parameter in the format settings. +You can find this option under the **Additional format** options panel. -Hive partitioning divides a table into multiple files based on the catalog structure. -Each catalog level is associated with a column and the catalogs are named in the format of column_name=value. +Hive partitioning involves dividing a table into multiple files based on the catalog structure. +Each catalog level is associated with a column, and the catalogs are named in the format of column_name=value. -The partitions of the data set and types of columns are discovered automatically. +The partitions of the data set and types of columns are automatically discovered. ### Environment variables in parameters @@ -246,7 +235,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -257,18 +246,20 @@ Click the **Save** connection button when the test is successful otherwise, you After creating a connection, you can register a single table. -To view the schema, expand the connection in the tree view on the left. +To view the schema, follow these steps: -Then, click on the three dots icon next to the schema name(1.) and select the **Add table** (2.) option. -This will open the **Add table** popup modal. +1. Expand the connection in the tree view on the left. +2. Click on the three dots icon next to the schema name. +3. Select the **Add table** option. This will open the **Add table** popup modal. -![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-1.png){ loading=lazy } + ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-1.png){ loading=lazy } -Enter the table name and the path absolute to the file. Save the new table configuration. +4. Enter the table name and the absolute path to the file. +5. Save the new table configuration. !!! tip "Use of the relative path" - If the schema specifies the folder path, use only the file name with extension instead of an absolute path. + If the schema specifies the folder path, use only the file name with an extension instead of an absolute path. !!! tip "Path in table name" @@ -276,23 +267,25 @@ Enter the table name and the path absolute to the file. Save the new table confi ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-2.png){ loading=lazy } -After saving the new table configuration, the new table will be present under the schema. -You can view the list of columns by clicking on "Columns" under the table in the three view on the left. +After saving the new table configuration, the table will appear under the specified schema. +To expand the list of columns, click on the **Columns** under the table in the three-view on the left. -You can verify the import tables job in the notification panel on the right corner. +You can check the status of the table import job in the notification panel located in the top right corner. ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-3.png){ loading=lazy } -If the job completes successfully, the created table will be imported and ready to use. +If the job is successful, the table will be created, imported, and ready to use. ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-4.png){ loading=lazy; width="1200px" } ## Add connection using DQOps Shell -The following examples use parquet file format. To connect to csv or json, put the expected file format instead of "parquet" in the example commands. +The following examples demonstrate how to import Parquet file format to Google Cloud Storage buckets. DQOps uses the DuckDB +connector to work with Google Cloud Storage buckets. +To import CSV or JSON files, replace `parquet` with the appropriate file format in the example commands. -To add a connection run the following command in DQOps Shell. +To add a connection, execute the following command in DQOps Shell. ``` dqo> connection add @@ -349,7 +342,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -360,7 +353,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -392,6 +385,6 @@ YAML file format. ## Next steps - Learn about more advanced importing when [working with files](../working-with-dqo/working-with-files.md) -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). -- The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. +- The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/json.md b/docs/data-sources/json.md index 1784c6bdd4..f7c8ed7fb3 100644 --- a/docs/data-sources/json.md +++ b/docs/data-sources/json.md @@ -17,7 +17,7 @@ Additional configuration is required **only when using remote storage** (AWS S3, When using remote cloud storage, make sure your account has access to the remote directory containing JSON files. The permissions granted should allow you to list the files and directories, as well as read the contents of the files. -## Add connection to JSON files using the user interface +## Add a connection to JSON files using the user interface ### **Navigate to the connection settings** @@ -145,7 +145,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -188,7 +188,7 @@ Enter the table name and the path absolute to the file. Save the new table confi !!! tip "Use of the relative path" - If the schema specifies the folder path, use only the file name with extension instead of an absolute path. + If the schema specifies the folder path, use only the file name with an extension instead of an absolute path. !!! tip "Path in table name" @@ -207,7 +207,7 @@ If the job completes successfully, the created table will be imported and ready ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-4.png){ loading=lazy; width="1200px" } -## Add JSON connection using DQOps Shell +## Add a JSON connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -271,7 +271,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -282,7 +282,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -366,6 +366,6 @@ To set the credential file for AWS in DQOps, follow steps: ## Next steps - Learn about more advanced importing when [working with files](../working-with-dqo/working-with-files.md) -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/mysql.md b/docs/data-sources/mysql.md index a749c553a7..94b9228d64 100644 --- a/docs/data-sources/mysql.md +++ b/docs/data-sources/mysql.md @@ -18,7 +18,7 @@ Use the TCP/IP Properties (IP Addresses Tab) dialog box to configure the TCP/IP In case of restrictions, you need to add the IP address used by DQOps to [NDB Cluster TCP/IP Connections Using Direct Connections](https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-tcp-definition-direct.html). -## Add MySQL connection using the user interface +## Add a MySQL connection using the user interface ### **Navigate to the connection settings** @@ -65,7 +65,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -94,7 +94,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add MySQL connection using DQOps Shell +## Add a MySQL connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -150,7 +150,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -160,7 +160,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -197,6 +197,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/oracle.md b/docs/data-sources/oracle.md index 314e55737e..56bded8c69 100644 --- a/docs/data-sources/oracle.md +++ b/docs/data-sources/oracle.md @@ -9,7 +9,7 @@ Read this guide to learn how to connect DQOps to Oracle from the UI, command-lin Oracle Database is a robust object relational database that provides efficient and effective solutions for database users such as delivering high performance, protecting users from unauthorized access, and enabling fast failure recovery. -## Add Oracle connection using the user interface +## Add an Oracle connection using the user interface ### **Navigate to the connection settings** @@ -55,7 +55,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -81,7 +81,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Oracle connection using DQOps Shell +## Add an Oracle connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -134,7 +134,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -144,7 +144,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -181,6 +181,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/parquet.md b/docs/data-sources/parquet.md index 215bf69832..68c1954978 100644 --- a/docs/data-sources/parquet.md +++ b/docs/data-sources/parquet.md @@ -17,7 +17,7 @@ Additional configuration is required **only when using remote storage** (AWS S3, When using remote cloud storage, make sure your account has access to the remote directory containing Parquet files. The permissions granted should allow you to list the files and directories, as well as read the contents of the files. -## Add connection to Parquet files using the user interface +## Add a connection to Parquet files using the user interface ### **Navigate to the connection settings** @@ -134,7 +134,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -177,7 +177,7 @@ Enter the table name and the path absolute to the file. Save the new table confi !!! tip "Use of the relative path" - If the schema specifies the folder path, use only the file name with extension instead of an absolute path. + If the schema specifies the folder path, use only the file name with an extension instead of an absolute path. !!! tip "Path in table name" @@ -196,7 +196,7 @@ If the job completes successfully, the created table will be imported and ready ![Register table](https://dqops.com/docs/images/working-with-dqo/adding-connections/duckdb/register-single-table-4.png){ loading=lazy; width="1200px" } -## Add Parquet connection using DQOps Shell +## Add a Parquet connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -260,7 +260,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -271,7 +271,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -356,6 +356,6 @@ To set the credential file for AWS in DQOps, follow steps: ## Next steps - Learn about more advanced importing when [working with files](../working-with-dqo/working-with-files.md) -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/postgresql.md b/docs/data-sources/postgresql.md index 654398242c..7a4936179e 100644 --- a/docs/data-sources/postgresql.md +++ b/docs/data-sources/postgresql.md @@ -17,7 +17,7 @@ By default, PostgreSQL restricts connections to hosts and networks included in t pg_hba.conf file. In case of restrictions you need to add the IP address used by DQOps to [Allowed IP Addresses in PostgreSQL Network Policies](https://www.postgresql.org/docs/9.1/auth-pg-hba-conf.html). -## Add PostgreSQL connection using the user interface +## Add a PostgreSQL connection using the user interface ### **Navigate to the connection settings** @@ -61,7 +61,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -86,7 +86,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add PostgreSQL connection using DQOps Shell +## Add a PostgreSQL connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -135,7 +135,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -145,7 +145,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -180,6 +180,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/presto.md b/docs/data-sources/presto.md index ed864f0f39..d6975a6ace 100644 --- a/docs/data-sources/presto.md +++ b/docs/data-sources/presto.md @@ -8,7 +8,7 @@ Read this guide to learn how to connect DQOps to Presto from the UI, command-lin Presto is an open source SQL query engine that’s fast, reliable, and efficient at scale. -## Add Presto connection using the user interface +## Add a Presto connection using the user interface ### **Navigate to the connection settings** @@ -54,7 +54,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -83,7 +83,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Presto connection using DQOps Shell +## Add a Presto connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -134,7 +134,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -144,7 +144,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -179,6 +179,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/redshift.md b/docs/data-sources/redshift.md index f026556ff4..bd1c7bc28b 100644 --- a/docs/data-sources/redshift.md +++ b/docs/data-sources/redshift.md @@ -17,7 +17,7 @@ Amazon Redshift uses an elastic IP address for the external IP address. An elast address is a static IP address. In case of restrictions, you need to add the IP address used by DQOps to [Allowed IP Addresses in Redshift Network Policies](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-clusters-vpc.html). -## Add Redshift connection using the user interface +## Add a Redshift connection using the user interface ### **Navigate to the connection settings** @@ -64,7 +64,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -94,7 +94,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Redshift connection using DQOps Shell +## Add a Redshift connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -145,7 +145,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -155,7 +155,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -241,6 +241,6 @@ To set the credential file in DQOps, follow these steps: ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/single-store.md b/docs/data-sources/single-store.md index 56a3a610ed..ae3b94c3c0 100644 --- a/docs/data-sources/single-store.md +++ b/docs/data-sources/single-store.md @@ -12,7 +12,7 @@ SingleStoreDB is a distributed SQL database that offers high-throughput transact To add SingleStoreDB data source connection to DQOps you need a SingleStore account. -## Add SingleStoreDB connection using the user interface +## Add a SingleStoreDB connection using the user interface ### **Navigate to the connection settings** @@ -59,7 +59,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png) -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -88,7 +88,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png) -## Add SingleStoreDB connection using DQOps Shell +## Add a SingleStoreDB connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -150,7 +150,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -160,7 +160,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -199,6 +199,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/snowflake.md b/docs/data-sources/snowflake.md index 8f88ea06d9..7515090707 100644 --- a/docs/data-sources/snowflake.md +++ b/docs/data-sources/snowflake.md @@ -16,7 +16,7 @@ By default, Snowflake instances are open to any IP address unless you configure policies that restrict this communication. In case of restrictions you need to add the IP address used by DQOps to [Allowed IP Addresses in Snowflake Network Policies](https://docs.snowflake.com/en/user-guide/network-policies#modifying-network-policies). -## Add Snowflake connection using the user interface +## Add a Snowflake connection using the user interface ### **Navigate to the connection settings** @@ -62,7 +62,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -89,7 +89,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Snowflake connection using DQOps Shell +## Add a Snowflake connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -140,7 +140,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -150,7 +150,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -188,6 +188,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/spark.md b/docs/data-sources/spark.md index e1dd8e9721..771a100ef4 100644 --- a/docs/data-sources/spark.md +++ b/docs/data-sources/spark.md @@ -12,7 +12,7 @@ Apache Spark is an open-source unified analytics engine for large-scale data pro You need a Spark Thrift Server to be running that provides a connection through JDBC to data in Spark. -## Add Spark connection using the user interface +## Add a Spark connection using the user interface ### **Navigate to the connection settings** @@ -57,7 +57,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -86,7 +86,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Spark connection using DQOps Shell +## Add a Spark connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -135,7 +135,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -145,7 +145,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -178,6 +178,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/sql-server.md b/docs/data-sources/sql-server.md index 376e51ad83..ed9058e995 100644 --- a/docs/data-sources/sql-server.md +++ b/docs/data-sources/sql-server.md @@ -16,7 +16,7 @@ To add SQL Server data source connection to DQOps you need a SQL Server account. Use the TCP/IP Properties (IP Addresses Tab) dialog box to configure the TCP/IP protocol options for a specific IP address. In case of restrictions, you need to add the IP address used by DQOps to [Allowed IP Addresses in SQL Server Network Policies](https://learn.microsoft.com/en-us/sql/tools/configuration-manager/tcp-ip-properties-ip-addresses-tab?view=sql-server-ver16). -## Add SQL Server connection using the user interface +## Add an SQL Server connection using the user interface ### **Navigate to the connection settings** @@ -62,7 +62,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -92,7 +92,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add SQL Server connection using DQOps Shell +## Add an SQL Server connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -143,7 +143,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -153,7 +153,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -189,6 +189,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file diff --git a/docs/data-sources/trino.md b/docs/data-sources/trino.md index f62df5743e..bf4af9f136 100644 --- a/docs/data-sources/trino.md +++ b/docs/data-sources/trino.md @@ -8,7 +8,7 @@ Read this guide to learn how to connect DQOps to Trino from the UI, command-line Trino is a fast distributed SQL query engine for big data analytics that helps you explore your data universe. -## Add Trino connection using the user interface +## Add a Trino connection using the user interface ### **Navigate to the connection settings** @@ -55,7 +55,7 @@ For example: ![Adding connection JDBC settings](https://dqops.com/docs/images/working-with-dqo/adding-connections/connection-settings-JDBC-properties2.png){ loading=lazy; width="1200px" } -To remove the property click on the trash icon at the end of the input field. +To remove the property, click the trash icon at the end of the input field. After filling in the connection settings, click the **Test Connection** button to test the connection. @@ -84,7 +84,7 @@ or modify the schedule for newly imported tables. ![Importing tables - advisor](https://dqops.com/docs/images/working-with-dqo/adding-connections/importing-tables-advisor.png){ loading=lazy; width="1200px" } -## Add Trino connection using DQOps Shell +## Add a Trino connection using DQOps Shell To add a connection run the following command in DQOps Shell. @@ -140,7 +140,7 @@ After adding connection run `table import -c=connection1` to select schemas and DQOps will ask you to select the schema from which the tables will be imported. -You can also add the schema and table name as a parameter to import tables in just a single step. +You can also add the schema and table name as parameters to import tables in just a single step. ``` dqo> table import --connection={connection name} @@ -150,7 +150,7 @@ dqo> table import --connection={connection name} DQOps supports the use of the asterisk character * as a wildcard when selecting schemas and tables, which can substitute any number of characters. For example, use pub* to find all schema a name with a name starting with "pub". The * -character can be used at the beginning, in the middle or at the end of the name. +character can be used at the beginning, middle, or end of the name. ## Connections configuration files @@ -185,6 +185,6 @@ YAML file format. ## Next steps -- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [full list of use cases here](../examples/index.md). +- We have provided a variety of use cases that use openly available datasets from [Google Cloud](https://cloud.google.com/datasets) to help you in using DQOps effectively. You can find the [complete list of use cases here](../examples/index.md). - DQOps allows you to keep track of the issues that arise during data quality monitoring and send alert notifications directly to Slack. Learn more about [incidents](../working-with-dqo/managing-data-quality-incidents-with-dqops.md) and [notifications](../integrations/webhooks/index.md). - The data in the table often comes from different data sources and vendors or is loaded by different data pipelines. Learn how [data grouping in DQOps](../working-with-dqo/set-up-data-grouping-for-data-quality-checks.md) can help you calculate separate data quality KPI scores for different groups of rows. \ No newline at end of file