0.19 to 0.21

Configuration

The biggest changes from 0.19 to 0.21 are related to the application.conf file, which has been restructured significantly.

The configuration for backends now is all contained within a backend stanza, which specifies 1 stanza per name per backend and a default backend, as follows:

backend {
    default=Local
    providers {
        Local {
            actor-factory: "class path to BackendLifecycleActorFactory implementation"
            config {
                ... backend specific config ...
            }
        }
        JES {
            actor-factory: "class path to BackendLifecycleActorFactory implementation"
            config {
                ... backend specific config ...
            }
        }
        SGE {
            actor-factory: "class path to BackendLifecycleActorFactory implementation"
            config {
                ... backend specific config ...
            }
        }
    }
}

Migrating a 0.19 config file to a 0.21 config file

Copy core/src/main/resources/application.conf somewhere, and open up your old config file and copy the following values from the 0.19 config into the 0.21 config:

Below is a table based diff, but there is also a visual diff of the changes.

0.19	0.21	Comments
	`akka.actor.deployment`	Configures the number of threads to handle workflow log copying
	`akka.dispatchers.io-dispatcher`	Configures the "IO" dispatcher
`google.applicationName`	`google.application-name`
`google.cromwellAuthenticationScheme`	removed	re-expressed as other configuration settings
`google.serverAuth.pemFile`	`google.auths` with scheme `service_account`, then `pem-file`
`google.serverAuth.serviceAccountId`	`google.auths` with scheme `service_account`, then `service-account-id`
`google.userAuthenticationScheme`	re-expressed as other configuration settings
`google.refreshTokenAuth.client_id`	`google.auths` with scheme `refresh_token`, then `client-id`
`google.refreshTokenAuth.client_secret`	`google.auths` with scheme `refresh_token`, then `client-secret`
	`engine.filesystems`	Defines additional filesystems that the engine is able to use to execute read and write WDL functions at the workflow-level
`docker`	`backend.JES.config.dockerhub`	Moved to JES-specific configuration
`jes.project`	`backend.JES.config.project`
`jes.baseExecutionBucket`	`backend.JES.config.root`
`jes.endpointUrl`	`backend.JES.config.genomics.endpoint-url`
`jes.maximumPollingInterval`	`backend.JES.config.maximum-polling-interval`
`shared-filesystem.root`	`backend.[Local	SGE].config.root`
`shared-filesystem.localization`	`backend.[Local	SGE].config.filesystems.localization`
`backend.abortJobsInTerminate`	`system.abort-jobs-on-terminate`
`restart-workflow`	`system.workflow-restart`
`database.config`	unchanged	Set your database
`database.main.[mysql	hsqldb]`	unchanged

Workflow Options

The following workflow options have changed names:

0.19	0.21
workflow_log_dir	final_workflow_log_dir
outputs_path	final_workflow_outputs_dir
call_logs_dir	final_call_logs_dir
defaultRuntimeOptions	default_runtime_attributes
workflowFailureMode	workflow_failure_mode

API Endpoints

POST /api/workflows/:version/validate has been removed
GET /api/workflows/:version/:id/outputs/:call has been removed
GET /api/workflows/:version/query now requires start and end offsets. See the README section for this API for more details.

Database Migrations

Cromwell's database structure has changed significantly between 0.19 and 0.21. All pre-existing data has to be transformed/moved to new tables in order to be usable. The migration process can be split into 2 steps:

Restart Migration

Cromwell 0.21 will migrate workflows that were in Running state to the new database structure in order to attempt to resume them once the server restarts. This will ensure that even if cromwell is stopped while workflows are still running they aren't lost. No particular action/configuration is required for this step.

Metadata Migration (MySQL Only)

In order to keep metadata from previous (and current) workflow runs, all the data has to be moved to a new centralized table. Depending on the number and shape of the workflows in your database this step can be significantly time and space consuming. It is not possible to give an accurate estimation due to the multiple variables in play like number of workflows, complexity (number of tasks, scatters, attempts per task, etc...), hardware performance (of the database, of the machine running cromwell), backends used, etc...

However, a good rule of thumb is to make sure that your database has enough disk space to grow a factor 10. This is due to the fact that data is de-normalized during the migration. In particular all inputs and outputs, which means the more complex / large your outputs are (large arrays, etc...), the more your database will grow. Also be aware that the migration can take several hours for substantially large databases.

Important Notes

For better performance, make sure the flag rewriteBatchedStatements is set to true. This can be done by adding to your database connection url.

e.g:

jdbc:mysql:http://localhost:3006/cromwell_db?rewriteBatchedStatements=true

See the mysql doc for more information.

Because of the nature of wdl inputs and outputs, as well as the way they were stored up until cromwell 0.19, it is necessary to pull them out of the database, process them one by one in cromwell, and re-insert them. For that reason, if your database contains workflows with very large inputs or outputs (for example Arrays of several thousands of elements, large matrix, etc...), you might want to tune the migration-read-batch-size and migration-write-batch-size configuration fields (see the database section in application.conf)
- migration-read-batch-size sets the number of rows that should be retrieved at the same time from the database. Once every row is processed, the next set is retrieved until there are no more. If your workflows/tasks have very large inputs or outputs, a number too large here could cause out of memory errors, or extended waiting times to pull the data from the database. On the other hand, a number too small could decrease performance by causing more round-trips to the database with the associated overhead.
- migration-write-batch-size sets the number of insert statements that are buffered before being committed in a transaction. This decreases the number of queries to the database, while making sure the batch never gets too big. You might consider decreasing this field if your workflows/tasks have very large Strings as input or output (for example several MB of text).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIGRATION.md

MIGRATION.md

0.19 to 0.21

Configuration

Migrating a 0.19 config file to a 0.21 config file

Workflow Options

API Endpoints

Database Migrations

Restart Migration

Metadata Migration (MySQL Only)

Important Notes

Files

MIGRATION.md

Latest commit

History

MIGRATION.md

File metadata and controls

0.19 to 0.21

Configuration

Migrating a 0.19 config file to a 0.21 config file

Workflow Options

API Endpoints

Database Migrations

Restart Migration

Metadata Migration (MySQL Only)

Important Notes