diff --git a/CHANGELOG.md b/CHANGELOG.md index ffecb1c6..34bc184d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,14 +1,20 @@ # BigFlow changelog +## Version 1.10.0 + +### Added + +* BigQuery query job labeling for collect and write operations. Labels are passed via `job_labels` dict argument in `DatasetConfiguration` and `DatasetManager`. + ## Version 1.9.0 -### Changes +### Changed * Switched from Google Container Registry to Artifact Registry. Made `-r`/`--docker-repository` common for all deploy commands. Build and deploy commands authenticate to the Docker repository taken from `deployment_config.py` or CLI arguments, instead of hardcoded `https://eu.gcr.io`. ## Version 1.8.0 -### Changes +### Changed * Bumped basic dependencies: Apache Beam 2.48.0, google-cloud-bigtable 2.17.0, google-cloud-language 2.10.0, google-cloud-storage 2.11.2, among others (#374). * Added the `env_variable` argument to `bigflow.Workflow` which enables to change a name of the variable used to obtain environment name (#365). diff --git a/docs/technologies.md b/docs/technologies.md index 26d26cdf..015649ac 100644 --- a/docs/technologies.md +++ b/docs/technologies.md @@ -309,6 +309,7 @@ to call BigQuery SQL. Fully qualified names of internal tables are resolved to `{project_id}.{dataset_name}.{table_name}`. * `external_tables` — Dict that defines aliases for external table names. Fully qualified names of those tables have to be declared explicitly. +* `job_labels` — Dict of labels that will be set on BigQuery jobs. The distinction between internal and external tables shouldn't be treated too seriously. Internal means `mine`. External means any other. It's just a naming convention. @@ -511,7 +512,7 @@ The `table_labels` and `dataset_labels` parameters allow your workflow to create On the first run, tables are not created yet, so we can not create labels then. Labels are added on second and later run when tables are already created. ```python -from bigflow.bigquery import DatasetConfig +from bigflow.bigquery import DatasetConfig dataset_config = DatasetConfig( env='dev', @@ -526,8 +527,19 @@ dataset_config = DatasetConfig( } }, dataset_labels={"dataset_label_1": "value_1", "dataset_label_2": "value_2"}).create_dataset_manager() +``` +The `job_labels` argument allows to label BigQuery job. It is passed to [`QueryJobConfig.labels`](https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_labels) +in `write` and `collect` methods of `DatasetManager`. -``` +```python +from bigflow.bigquery import DatasetConfig -You can us it as an ad-hoc tool or put a labeling job to a workflow as well. \ No newline at end of file +dataset_config = DatasetConfig( + env='dev', + project_id='your-project-id', + dataset_name='example_dataset', + internal_tables=['example_table'], + external_tables={}, + job_labels={"owner": "John Doe"}).create_dataset_manager() +```