Changelog and docs

allegro · May 29, 2024 · ef6094d · ef6094d
1 parent e63472f
commit ef6094d
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 5 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,14 +1,20 @@
 # BigFlow changelog
 
+## Version 1.10.0
+
+### Added
+
+* BigQuery query job labeling for collect and write operations. Labels are passed via `job_labels` dict argument in `DatasetConfiguration` and `DatasetManager`.
+
 ## Version 1.9.0
 
-### Changes
+### Changed
 
 * Switched from Google Container Registry to Artifact Registry. Made `-r`/`--docker-repository` common for all deploy commands. Build and deploy commands authenticate to the Docker repository taken from `deployment_config.py` or CLI arguments, instead of hardcoded `https://eu.gcr.io`.
 
 ## Version 1.8.0
 
-### Changes
+### Changed
 
 * Bumped basic dependencies: Apache Beam 2.48.0, google-cloud-bigtable 2.17.0, google-cloud-language 2.10.0, google-cloud-storage 2.11.2, among others (#374).
 * Added the `env_variable` argument to `bigflow.Workflow` which enables to change a name of the variable used to obtain environment name (#365).

diff --git a/docs/technologies.md b/docs/technologies.md
@@ -309,6 +309,7 @@ to call BigQuery SQL.
   Fully qualified names of internal tables are resolved to `{project_id}.{dataset_name}.{table_name}`.
 * `external_tables` &mdash; Dict that defines aliases for external table names.
   Fully qualified names of those tables have to be declared explicitly.
+* `job_labels` &mdash; Dict of labels that will be set on BigQuery jobs.
 
 The distinction between internal and external tables shouldn't be treated too seriously.
 Internal means `mine`. External means any other. It's just a naming convention.
@@ -511,7 +512,7 @@ The `table_labels` and `dataset_labels` parameters allow your workflow to create
 On the first run, tables are not created yet, so we can not create labels then. Labels are added on second and later run when tables are already created.
 
 ```python
-from bigflow.bigquery import DatasetConfig 
+from bigflow.bigquery import DatasetConfig
 
 dataset_config = DatasetConfig(
     env='dev',
@@ -526,8 +527,19 @@ dataset_config = DatasetConfig(
       }
     },
     dataset_labels={"dataset_label_1": "value_1", "dataset_label_2": "value_2"}).create_dataset_manager()
+```
 
+The `job_labels` argument allows to label BigQuery job. It is passed to [`QueryJobConfig.labels`](https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_labels)
+in `write` and `collect` methods of `DatasetManager`.
 
-```
+```python
+from bigflow.bigquery import DatasetConfig
 
-You can us it as an ad-hoc tool or put a labeling job to a workflow as well.
+dataset_config = DatasetConfig(
+    env='dev',
+    project_id='your-project-id',
+    dataset_name='example_dataset',
+    internal_tables=['example_table'],
+    external_tables={},
+    job_labels={"owner": "John Doe"}).create_dataset_manager()
+```