-
Notifications
You must be signed in to change notification settings - Fork 5
How to run workflows
Filippo Ledda edited this page Mar 19, 2021
·
6 revisions
Workflows can be seen as kind of structured batch operation running in the cluster using on-demand resources. Workflows are composed by one on one tasks and can be of the following types depending how the tasks are arranged:
- Simple workflow with one task
- Workflow with parallel tasks
- Pipeline workflow with tasks running sequentially
- Simple DAG (pipeline of parallel tasks)
Tasks are Docker container executions managed by the workflows system. Any docker image can be used as a base for a task.
- Use on demand resources: when the workflow runs it requests the amount of resources (e.g. cpu and memory) in the cluster, and frees the resources as soon it stops. This is usually the case for heavy computations.
- Specific technological stack: a computational pipeline can require a specific technological stack and/or a specific (and possibly heavy) set of libraries. The base stack may be different from the one used on the main service (usually a Python Flask application). Even if the base stack is the same adding all the libraries in the main service may not be the best solution
- Make asynchronous long operations
Cloudharness allows to run workflows through Argo workflows providing the Argo installation and a Python library to run workflows programmatically.
The cloudharness Operations api allows to easily run a workflow from Python. The pattern to run a workflow is the following:
- create the tasks
- Add the tasks to an operation object of the needed type
- Execute the operation
Simple example of a parallel operation running Python code:
from cloudharness.workflows import operations, tasks
def f():
import time
time.sleep(2)
print('whatever')
op = operations.ParallelOperation('test-parallel-op-', (tasks.PythonTask('p1', f), tasks.PythonTask('p2', f)))