Skip to content

Commit

Permalink
Changelog and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
artnowo-alle committed May 29, 2024
1 parent e63472f commit ef6094d
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 5 deletions.
10 changes: 8 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,20 @@
# BigFlow changelog

## Version 1.10.0

### Added

* BigQuery query job labeling for collect and write operations. Labels are passed via `job_labels` dict argument in `DatasetConfiguration` and `DatasetManager`.

## Version 1.9.0

### Changes
### Changed

* Switched from Google Container Registry to Artifact Registry. Made `-r`/`--docker-repository` common for all deploy commands. Build and deploy commands authenticate to the Docker repository taken from `deployment_config.py` or CLI arguments, instead of hardcoded `https://eu.gcr.io`.

## Version 1.8.0

### Changes
### Changed

* Bumped basic dependencies: Apache Beam 2.48.0, google-cloud-bigtable 2.17.0, google-cloud-language 2.10.0, google-cloud-storage 2.11.2, among others (#374).
* Added the `env_variable` argument to `bigflow.Workflow` which enables to change a name of the variable used to obtain environment name (#365).
Expand Down
18 changes: 15 additions & 3 deletions docs/technologies.md
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,7 @@ to call BigQuery SQL.
Fully qualified names of internal tables are resolved to `{project_id}.{dataset_name}.{table_name}`.
* `external_tables` — Dict that defines aliases for external table names.
Fully qualified names of those tables have to be declared explicitly.
* `job_labels` — Dict of labels that will be set on BigQuery jobs.

The distinction between internal and external tables shouldn't be treated too seriously.
Internal means `mine`. External means any other. It's just a naming convention.
Expand Down Expand Up @@ -511,7 +512,7 @@ The `table_labels` and `dataset_labels` parameters allow your workflow to create
On the first run, tables are not created yet, so we can not create labels then. Labels are added on second and later run when tables are already created.

```python
from bigflow.bigquery import DatasetConfig
from bigflow.bigquery import DatasetConfig

dataset_config = DatasetConfig(
env='dev',
Expand All @@ -526,8 +527,19 @@ dataset_config = DatasetConfig(
}
},
dataset_labels={"dataset_label_1": "value_1", "dataset_label_2": "value_2"}).create_dataset_manager()
```

The `job_labels` argument allows to label BigQuery job. It is passed to [`QueryJobConfig.labels`](https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.job.QueryJobConfig#google_cloud_bigquery_job_QueryJobConfig_labels)
in `write` and `collect` methods of `DatasetManager`.

```
```python
from bigflow.bigquery import DatasetConfig

You can us it as an ad-hoc tool or put a labeling job to a workflow as well.
dataset_config = DatasetConfig(
env='dev',
project_id='your-project-id',
dataset_name='example_dataset',
internal_tables=['example_table'],
external_tables={},
job_labels={"owner": "John Doe"}).create_dataset_manager()
```

0 comments on commit ef6094d

Please sign in to comment.