Flowman 1.0.0
Flowman Version 1.0.0 !
Flowman has proven to be robust and is used in production at multiple companies since several years. Time to officially celebrate its success with a version 1.0.0!
This is a huge and exciting release with many improvements. The main features are:
- The Flowman 1.0.0 release. But making such a release is more work than one might expect.
- New client/server Flowman shell to support accessing real data during development. This feature is still experimental.
- Official support for Azure Synapse
Full list of changes
- github-314: Move avro related functionality into separate plugin
- github-307: Describe Flowmans security policy in SECURITY.md
- github-315: Create build profile for CDP 7.1 with Spark 3.3
- github-317: Perform retry on failing JDBC commands
- github-318: Support mappings from different projects and with non-standard outputs in SQL
- github-140: Strictly check imports
- github-316: Beautify README.md
- github-310: Explain versioning policy in CHANGELOG.md
- github-313: Improve example for "observe" mapping
- github-319: Support Oracle for History Server
- github-320: Do not fall back to "inline" schema when no kind is specified
- github-321: [BUG] Properly support lower case / upper case table names in Oracle
- github-309: Automate integration tests
- github-322: Remove flowman-client
- github-324: Log environment variables for imported projects
- github-329: Create Kernel API
- github-330: Implement Kernel Server
- github-331: Implement Kernel Client
- github-332: Build Flowman Shell on top of kernel Client/Server
- github-334: Create standalone Flowman Kernel application
- github-338: Update Spark to 3.3.2
- github-333: Forward Logs from Kernel to Client
- github-339: Set Copyright to "The Flowman Authors"
- github-345: [BUG] Loading an embedded schema inside a jar file should not throw an exception
- github-346: Create build profile for Databricks
- github-343: Log all client requests in kernel
- github-342: Automatically close session when the client disconnects from kernel
- github-351: [BUG] Failing execution listener instantiation should not fail a build
- github-347: Exclude AWS SDK for Databricks and EMR build profiles
- github-352: [BUG] Spark sessions should not contain duplicate jars from different plugins
- github-353: Successful runs should not use System.exit(0)
- github-354: Optionally load custom log4j config from jar
- github-358: Provide different log4j config for Flowman server and kernel
- github-359: Update jline dependency
- github-357: Spark session should not be shut down in Databricks environment
- github-360: Logging should exclude more Databricks specific stuff
- github-361: Work around low-level API differences in DataBricks
- github-363: HiveDatabaseTarget should accept an optional location
- github-311: Create integration test for EMR
- github-362: Upgrade EMR to 6.10
- github-369: [BUG] Prevent endless loop in Kernel client, when getContext fails
- github-370: The Kernel client should use temporary workspaces with automatic cleanup
- github-337: Add documentation for flowman-rshell
- github-336: Add documentation for flowman-kernel
- github-366: Feature parity between Flowman shell and Flowman remote shell
- github-365: Implement saving mappings in Flowman Kernel/client
- github-367: Create integration test for "quickstart" archetype
- github-375: [BUG] "project reload" does not work correctly in remote shell with nested directories
- github-376: Document options to parallelize work
- github-378: Remove travis-ci integration
- github-308: Revise branching model
- github-381: Remove json-smart dependency
- github-382: [BUG] Parallel execution of multiple dq checks runs too many checks on Java 17
- github-384: Improve documentation for using docker-compose
- github-377: Load override config/env from .flowman-env.yml
- github-344: Support .flowman-ignore file for Flowman Kernel client
- github-385: Update Flowman tutorial
- github-386: Create Integration Test for Azure Synapse
- github-387: Remove scala-arm dependency
- github-390: Rename "master" branch to "main"
- github-392: [BUG] 'relation' mapping should support numeric partition values
- github-393: Move Maven archetype to flowman-maven project
- github-394: [BUG] The Spark job group and description are not set for sql assertions
- github-395: Support optional file locations for project imports
- github-397: Automate build using GitHub actions
- github-403: Upgrade Spark 3.2 to 3.2.4
- github-404: [BUG] Partition columns do not support Timestamp data type
- github-409: [BUG] Fix build for AWS EMR 6.10 and Azure Synapse 3.3
- github-407: Update Delta to 2.3.0 for Spark 3.3
- github-406: Improve integration tests to automatically pick up the current Flowman version
- github-408: Make use of DeltaLake in Synapse integration test
- github-405: Document deployment to EMR and Azure Synapse
Breaking changes
This version introduces some (minor) breaking changes:
- All Avro related functionality is now moved into the new "flowman-avro" plugin. If you rely on such functionality,
you explicitly need to include the plugin in thedefault-namesapce.yml
file. - Imports are now strictly checked. This means when you cross-reference some entity in your project which is provided
by a different Flowman project, you now need to explicitly import the project in theproject.yml
- The
kind
for schema definitions is now a mandatory attribute, Flowman will not fall back to ainline
schema anymore.