5.5.0
Config parsing improvements
Before version 5.5.0, the only way of passing configuration to application was providing BASE64 encoded HOCON (for application config) and JSON (for Iglu resolver config) as a command line options.
Starting from version 5.5.0, it's possible to provide a full path to the configuration files. Here is an example, which mounts a config directory into the docker container at run time:
docker run \
-v /path/to/config:/myconfig \
snowplow/rdb-loader-redshift:5.5.0 \
--config /myconfig/loader.hocon \
--iglu-config /myconfig/resolver.json
It's no longer necessary to use BASE64 encoded strings on the command line, but to preserve compatibility the old way of configuring is still supported.
What is more, it's now possible to provide HOCON file for Iglu resolver configuration, so just like in the case of application configuration. This is important, as it allows you to utilize all great features of HOCON format for Iglu as well, like environment variable resolution. Plain JSON file is still supported.
These changes apply for all the loaders (Redshift, Snowflake, Databricks) and transformer (batch, streaming) applications.
Improved robustness of the loader
We've made quite a few small under-the-hood improvements, which we hope will make the loader more resilient against transient failures. We identified some of the most common edge-case error scenarios, where previous versions of the loader might hit an error, e.g. due to a stale connection or a network issue. The small changes include better handling of old connections, and retrying on transient failures.
Batch Transformer: transform_duration
metric
Batch transformer can now send a new metric to Cloudwatch, if configured: transform_duration
, which contains the duration needed to transform an input folder.
Upgrading
If you are already using a recent version of RDB Loader (3.0.0
or higher) then upgrading to 5.5.0
is as simple as pulling the newest docker images.
There are no changes needed to your configuration files.
docker pull snowplow/rdb-loader-redshift:5.5.0
docker pull snowplow/rdb-loader-snowflake:5.5.0
docker pull snowplow/rdb-loader-databricks:5.5.0
docker pull snowplow/transformer-pubsub:5.5.0
docker pull snowplow/transformer-kinesis:5.5.0
Starting from this version, batch transformer requires to use Java 11 om EMR (default is Java 8), for instance by running this script as a bootstrap action (needs to be stored on s3):
#!/bin/bash
set -e
sudo update-alternatives --set java /usr/lib/jvm/java-11-amazon-corretto.x86_64/bin/java
exit 0
Snowplow docs website has a full guide for running the RDB Loader and the transformer.
Changelog
- Bump Snowflake driver to 3.13.30 (#1256)
- Upgrade Databricks JDBC driver (#1254)
- Config parsing improvements (#1252)
- Loader: limit the total time spent retrying a failed load (#1251)
- Loader: do not skip batches on warehouse connection failures (#1250)
- Loader: Do not attempt rollback when connection is already closed (#1240)
- Use sbt-snowplow-release to build docker images (#1222)
- Loader: Improvements to webhook alerts (#1238)
- Add load_tstamp column to table definitions (#1233)
- Loader: Disable warnings on incomplete shredding for the streaming transformer (#967)
- Batch Transformer: emit transform_duration metric (#1236)
- Batch Transformer: use JDK 11 in assembly (#1241)
- Bump dependencies with CVEs (#1234)
- Loader: Retry failures for all warehouse operations (#1225)
- Loader: Avoid errors for "Connection is not available" (#1223)
- Upgrade to Cats Effect 3 (#1219)