-
Notifications
You must be signed in to change notification settings - Fork 2
Getting_Started
- Installation
- Configuration
- Add your username
- Configure a tenant
- Initialise a project
- Initialising a category
- Building your first tool
- Registering your tool with
init
- Building your first workflow
- Registering your workflow
- Syncing your tools / workflows with ICA.
- Show others how to run your workflow / tool
- Running a workflow through ICA
- Optimising your workflow
- Updating configuration files
- Acknowledging a contributor / maintainer of a tool or workflow
- Pre-commit hooks
- Further reading
This software is designed to be installed on your local workstation. ICA tokens are stored in user-level read-only files in the conda env. However, the repository path could be in a shared path without comprising any user's credentials. In general, collaborative work should be done through version controlling over a remote repository such as GitHub.
Install the latest release by heading to the releases page and downloading the latest zip file.
You will need the following prerequisites:
- conda
- jq
- yq (optional, but will need to be v4)
# Unzip the zip file
unzip "release-${version}.zip"
# Change into the extracted directory
cd "release-${version}.zip"
# Run the installation script, press '1' when prompted to update / create the conda environment
bash install.sh
The installation script will then create a conda environment called cwl-ica.
You will need to activate this environment with the following command:
conda activate cwl-ica
nothing provides requested nodejs >=18
You may need to add conda-forge to your channels with conda config --append channels conda-forge
You will need to also clone this repo to your computer:
If you are a member of the UMCCR GitHub organisation, please run the following:
# Subout the umccr repository for your own in the case of forking
git clone [email protected]:umccr/cwl-ica.git
You will now need to run the configuration command so that your conda environment knows where your local clone of cwl-ica
repo is:
# Ensure your cwl-ica conda env has first been activated with "conda activate cwl-ica"
cwl-ica configure-repo \
--repo-path /path/to/local-repository-clone
Great job! You've now configured your project.
You will need to reactivate your environment in order to complete the configuration with:
# Reactivate your environment with the following two commands
conda deactivate
conda activate cwl-ica
Add your username to user.yaml
. This will ensure that you're acknowledged for CWL files you create / maintain.
It also makes it much easier for future users to know who to contact when they need clarification on a CWL workflow / tool.
Add the --set-as-default
to save adding the --username
parameter later on when building your first tool. You will need to deactivate then reactivate your conda environment for this to take effect.
cwl-ica configure-user \
--username "Firstname Lastname" \
--email "[email protected]" \
--set-as-default
First check out the list of registered tenants in the repo with:
cwl-ica list-tenants
Then run the cwl-ica configure-tenant
command to create a mapping of tenant names and tenant ids.
You can then define projects to be in given tenants through the
--tenant-name
option in cwl-ica project-init
.
While this only seems useful if your ICA organisation spans over multiple tenants, this will future proof your workflows.
For now, cwl-ica configure-tenant
is a mandatory step before you initialise a project.
You can see all registered tenants through cwl-ica list-tenants
.
First check out the list of registered projects in the repo with:
cwl-ica list-projects
If you would like to add a project run the following command:
cwl-ica project-init \
--project-id "xxxx-yyyy..." \
--project-name "my-registered-project" \
--access-token "<project-access-token>" \
--tenant-name "<name-of-tenant>"
To determine the project id, and project name you will need to run
ica projects list
.
We take inspiration from the GIT_SSH environment
variable with our own CWL_ICA_API_KEY_SH
variable. This variable should point to an executable file (like a bash script)
that uses an environment variable ${PROJECT_API_KEY_PATH}
that is set to the project's project_api_key_name
attribute.
Confusing?? Let's go through an example.
I set my CWL_ICA_API_KEY_SH
variable to a file under ${CONDA_PREFIX}/etc/get_api_key.sh
(a bash script with executable permissions) with the following contents.
#!/usr/bin/env bash
# Set to fail
set -euo pipefail
# Set GPG right
export GPG_TTY="$(tty)"
# Check pass binary exists
if ! type pass 1>/dev/null 2>&1; then
echo "Could not find path to binary 'pass'. Please install pass first" 1>&2
fi
# Check if project specific api key is available
if [[ -n ${PROJECT_API_KEY_PATH-} ]] && pass list "/ica/api-keys/${PROJECT_API_KEY_PATH}" 1>/dev/null 2>&1; then
# Get project specific api key
pass "/ica/api-keys/${PROJECT_API_KEY_PATH}"
else
# Get default api key from pass (as done for ica-ica-lazy)
pass /ica/api-keys/default-api-key
fi
This allows me to use the pass
binary to store/manage my api-keys for each project.
When any cwl-ica
subcommand now tries to access my api-key, I must first enter my gpg password, the token
is then stored under ${CONDA_PREFIX}/etc/ica/tokens
. If the token expires, this script is called again to refresh that
token.
This method has the following benefits:
- API keys last indefinitely but tokens do not (and should not). This way, one doesn't have to manually update tokens, or worry about if a token has expired.
- The security level is up to the user. One could just have a file called
api-key.txt
where this script above simply prints the contents of the file that being the api-key, or they could set up multi-factor authentication when trying to access the api-key .
Once you've configured the script correctly you then need to ensure CWL_ICA_API_KEY_SH
is in your environment everytime you
run cwl-ica
. One way to do this is to add the following script to ${CONDA_PREFIX}/etc/conda/activate.d/
with a name such as get_api_key.sh
.
#!/usr/bin/env bash
# Export path to get-api-key script
export CWL_ICA_API_KEY_SH="${CONDA_PREFIX}/etc/get_api_key.sh"
${CONDA_PREFIX}/etc/conda/activate.d/
is where all of the configure-x
subcommands write their outputs to.
Now when we run conda activate cwl-ica
, we should expect to see CWL_ICA_API_KEY_SH
in our environment.
To confirm that you've completed this step correctly, run:
cwl-ica validate-api-key-script
Save yourself having to trawl through a plethora of workflows to find the one you're after.
You may assign a workflow to multiple categories.
A category does NOT have to be registered before registering your tool or workflow.
Categories are registered on ICA, but a given category may span multiple projects.
Like tenants and projects, you can see existing categories with:
cwl-ica list-categories
To create your own, run:
cwl-ica category-init \
--name "name of category" \
--description "optional, can instead use a large text field instead"
First we use the cwl-ica create-tool-from-template
command to create a
file that we can expand on to build our first tool.
This will automatically create an id, label and doc for us, along with the author metadata namespaces for us to fill in.
The following command will create a tool under tools/tabix/0.2.6/tabix-0.2.6.cwl
cwl-ica create-tool-from-template \
--tool-name tabix \
--tool-version 0.2.6
Fill out the rest of the tool and then validate it. You should also test the tool locally (if possible).
cwl-ica tool-validate
can be the most laborious part of the process but for good reason.
No one else will use your tool if it's not documented properly.
Check out contributions or our tools section for some examples on mastering your first cwl tool.
Now you've validated your tool, it's time to "register" it. This will:
- Create an entry in
tool.yaml
for this cwl tool. - Create a workflow ID, and workflow version for the tool on ICA.
- Keep the tool up-to-date on ICA.
- Create a user-friendly markdown document when pushed to the main branch.
You can register your tool with cwl-ica tool-init
.
If you decide later on, that a specific already initialised tool, would be convenient in a given project,
use the subcommand add-tool-to-project
to add the tool to the project.
If you've made it to this stage, congratulations! You've built a suite of tools and ready to stitch them together as a workflow.
Initialise the workflow through cwl-ica create-workflow-from-template
. You will need to also
'validate' your workflow with cwl-ica workflow-validate
.
This may be pretty tedious and is easier if you've first 'validated' all of your tools.
Once you've successfully run cwl-ica workflow-validate
, it's time to register your workflow.
Like a tool registry, registering your workflow will also keep it in sync on ICA, and create a user-friendly markdown document on the workflow when pushed to the main branch.
You can register your tool with cwl-ica workflow-init
.
Likewise, if you have an existing workflow with a new project, you may connect this workflow
to your new ICA project with cwl-ica add-workflow-to-project
.
For non-production projects, tools and workflows will sync with the registered workflow id and workflow version on each push to the main branch. You may also 'sync' your tool with the following commands:
cwl-ica tool-sync
or cwl-ica workflow-sync
.
For production tools / workflows, you will need to first create a pull-request to the main branch and request the workflow changes be approved.
If someone else approves the changes this will trigger a GitHub Actions workflow which will create
a new version suffix based on the git commit of the latest commit to the branch.
It is recommended the workflow has been first been fully tested in a non-production project first prior to creating a pull-request.
See below on how to manually trigger the GitHub Actions command if it fails the first time.
Registering a run instance of your workflow / tool will guide others how your workflow should be set up to run.
For tools, this means a plot showing the cpu and mem usage over time along with the duration of the tool length.
For workflows, this means a stacked bar chat of the cpu / mem usage over time along with the duration of the workflow.
To register a run instance use either:
cwl-ica register-tool-run-instance-id
or cwl-ica register-workflow-run-instance-id
.
You see the existing list of tools and workflows registered with the following commands:
cwl-ica list-tool-runs
or cwl-ica list-workflow-runs
.
If you are unsure on how to run a particular workflow, check out the ica catalogue page 🚧, which should have all of the documentation that you need.
There are two recommended ways of running your workflow.
-
Copying a tool / workflow submission template
Available if an ICA workflow run has been registered for this workflow
Use the
copy-workflow-submission-template
command to create a shell script and an input json file of a registered workflow or tool.You may wish to first edit the input json and then run the launch script.
-
Creating your own workflow submission template
Use the
create-workflow-submission-template
command to create a shell script and an input yaml file of a registered tool or workflow.You will need to first edit the input yaml and then run the launch script.
You will also need
yq
and either theica
binary orcurl
installed.
One can use the overrides
setting to optimise the cpu and mem usage
(or even change the docker container used by a step in a workflow or tool).
In order to view the step ids of a workflow, run cwl-ica get-workflow-step-ids
.
Use these in the overrides
settings to adjust the engine parameters for this step of a workflow.
More information on setting overrides
in the engineParameters
attribute of the launch json can be found [here][overrides_engine_parameters_docs]
It's not always to get everything right the first time, although the sync commands above can mean you can always correct the logic of your tools or workflows, what about when you want to add a category to a tool, or add a workflow to a project that perhaps you forgot to do in the init
stage?
Fortunately, there are a few commands that can help you out there.
A fresh installation of conda (or a new computer) may require you to re-set up your default env vars.
You may use the set-default-*
subcommands to do so, then reactivate your fresh cwl-ica environment.
cwl-ica add-tool-to-project
allows you to add a tool to a project. It will get its own ICA workflow ID and ICA workflow version and update project.yaml
with the new entry for that project.
cwl-ica add-workflow-to-project
is also available which will do exactly the same thing, except for workflows.
To add a category to a tool try cwl-ica add-category-to-tool
? This will update the ICA workflow ID and append the category to that tool in tool.yaml
cwl-ica add-category-to-tool
will do the same thing for workflows.
There may be a situation where one wishes to re-run the sync-tools-and-sync-workflows manually.
This may be post-approval of a pull-request that then requires more changes, or the workflow failed initially.
Whatever the reason, there is an answer.
- Create a branch (you may already have one on a PR)
- Click on Actions -> Sync tools and sync workflows
- Click 'Run workflow'
- Wait 20 minutes or so for the workflow to complete.
A gif is shown below on how to do steps 1 to 3:
Tasks switch hands from time to time, how does one update a tool to show the contacts of the existing maintainer without deleting the current user from the tool.
You can use cwl-ica add-maintainer-to-tool
to add a maintainer attribute to a tool, cwl-ica add-maintainer-to-workflow
also exists for adding maintainers to a workflow.
cwl-ica uses pre-commit hooks to ensure that yaml configuration files are not corrupted (i.e by a merge conflict).
To install pre-commit checkout https://pre-commit.com/.
Note you should install pre-commit globally to ensure Git clients such as GitKraken have
pre-commit
in their path.
Then run pre-commit install
in the cwl-ica directory.
Next time a commit occurs on your local machine, pre-commit will first check all of the config files are valid YAML files.
I would highly recommend checking out the 'Stories' section for applications of cwl-ica in conjunction with 'ica-ica-lazy'.