Skip to content

Latest commit

 

History

History
executable file
·
1279 lines (872 loc) · 68.2 KB

README.md

File metadata and controls

executable file
·
1279 lines (872 loc) · 68.2 KB

nf-core/tools nf-core/tools

Python tests codecov code style: black code style: prettier Imports: isort

install with Bioconda install with PyPI Get help on Slack

A python package with helper tools for the nf-core community.

Read this documentation on the nf-core website: https://nf-co.re/tools

Table of contents

The nf-core tools package is written in Python and can be imported and used within other packages. For documentation of the internal Python functions, please refer to the Tools Python API docs.

Installation

Bioconda

You can install nf-core/tools from bioconda.

First, install conda and configure the channels to use bioconda (see the bioconda documentation). Then, just run the conda installation command:

conda install nf-core

Alternatively, you can create a new environment with both nf-core/tools and nextflow:

conda create --name nf-core python=3.11 nf-core nextflow
conda activate nf-core

Python Package Index

nf-core/tools can also be installed from PyPI using pip as follows:

pip install nf-core

Docker image

There is a docker image that you can use to run nf-core/tools that has all of the requirements packaged (including Nextflow) and so should work out of the box. It is called nfcore/tools (NB: no hyphen!)

You can use this container on the command line as follows:

docker run -itv `pwd`:`pwd` -w `pwd` -u $(id -u):$(id -g) nfcore/tools
  • -i and -t are needed for the interactive cli prompts to work (this tells Docker to use a pseudo-tty with stdin attached)
  • The -v argument tells Docker to bind your current working directory (pwd) to the same path inside the container, so that files created there will be saved to your local file system outside of the container.
  • -w sets the working directory in the container to this path, so that it's the same as your working directory outside of the container.
  • -u sets your local user account as the user inside the container, so that any files created have the correct ownership permissions

After the above base command, you can use the regular command line flags that you would use with other types of installation. For example, to launch the viralrecon pipeline:

docker run -itv `pwd`:`pwd` -w `pwd` -u $(id -u):$(id -g) nfcore/tools launch viralrecon -r 1.1.0

If you use $NXF_SINGULARITY_CACHEDIR for downloads, you'll also need to make this folder and environment variable available to the continer:

docker run -itv `pwd`:`pwd` -w `pwd` -u $(id -u):$(id -g) -v $NXF_SINGULARITY_CACHEDIR:$NXF_SINGULARITY_CACHEDIR -e NXF_SINGULARITY_CACHEDIR nfcore/tools launch viralrecon -r 1.1.0

Docker bash alias

The above base command is a bit of a mouthful to type, to say the least. To make it easier to use, we highly recommend adding the following bash alias to your ~/.bashrc file:

alias nf-core="docker run -itv `pwd`:`pwd` -w `pwd` -u $(id -u):$(id -g) nfcore/tools"

Once applied (you may need to reload your shell) you can just use the nf-core command instead:

nf-core list

Docker versions

You can use docker image tags to specify the version you would like to use. For example, nfcore/tools:dev for the latest development version of the code, or nfcore/tools:1.14 for version 1.14 of tools. If you omit this, it will default to :latest, which should be the latest stable release.

If you need a specific version of Nextflow inside the container, you can build an image yourself. Clone the repo locally and check out whatever version of nf-core/tools that you need. Then build using the --build-arg NXF_VER flag as follows:

docker build -t nfcore/tools:dev . --build-arg NXF_VER=20.04.0

Development version

If you would like the latest development version of tools, the command is:

pip install --upgrade --force-reinstall git+https://github.com/nf-core/tools.git@dev

If you intend to make edits to the code, first make a fork of the repository and then clone it locally. Go to the cloned directory and install with pip (also installs development requirements):

pip install --upgrade -r requirements-dev.txt -e .

Using a specific Python interpreter

If you prefer, you can also run tools with a specific Python interpreter. The command line usage and flags are then exactly the same as if you ran with the nf-core command. Note that the module is nf_core with an underscore, not a hyphen like the console command.

For example:

python -m nf_core --help
python3 -m nf_core list
~/my_env/bin/python -m nf_core create --name mypipeline --description "This is a new skeleton pipeline"

Using with your own Python scripts

The tools functionality is written in such a way that you can import it into your own scripts. For example, if you would like to get a list of all available nf-core pipelines:

import nf_core.list
wfs = nf_core.list.Workflows()
wfs.get_remote_workflows()
for wf in wfs.remote_workflows:
    print(wf.full_name)

Please see https://nf-co.re/tools-docs/ for the function documentation.

Automatic version check

nf-core/tools automatically checks the web to see if there is a new version of nf-core/tools available. If you would prefer to skip this check, set the environment variable NFCORE_NO_VERSION_CHECK. For example:

export NFCORE_NO_VERSION_CHECK=1

Update tools

It is advisable to keep nf-core/tools updated to the most recent version. The command to update depends on the system used to install it, for example if you have installed it with conda you can use:

conda update nf-core

if you used pip:

pip install --upgrade nf-core

Please refer to the respective documentation for further details to manage packages, as for example conda or pip.

Activate shell completions for nf-core/tools

Auto-completion for the nf-core command is available for bash, zsh and fish. To activate it, add the following lines to the respective shell config files.

shell shell config file command
bash ~/.bashrc eval "$(_NF_CORE_COMPLETE=bash_source nf-core)"
zsh ~/.zshrc eval "$(_NF_CORE_COMPLETE=zsh_source nf-core)"
fish ~/.config/fish/completions/nf-core.fish eval (env _NF_CORE_COMPLETE=fish_source nf-core)

After a restart of the shell session you should have auto-completion for the nf-core command and all its sub-commands and options.

:::note The added line will run the command nf-core (which will also slow down startup time of your shell). You should therefore either have the nf-core/tools installed globally. You can also wrap it inside if type nf-core > /dev/null; then <YOUR EVAL CODE LINE> fi for bash and zsh or if command -v nf-core &> /dev/null eval (env _NF_CORE_COMPLETE=fish_source nf-core) end for fish. You need to then source the config in your environment for the completions to be activated. :::

:::info If you see the error command not found compdef , be sure that your config file contains the line autoload -Uz compinit && compinit before the eval line. :::

Listing pipelines

The command nf-core list shows all available nf-core pipelines along with their latest version, when that was published and how recently the pipeline code was pulled to your local system (if at all).

An example of the output from the command is as follows:

nf-core list

To narrow down the list, supply one or more additional keywords to filter the pipelines based on matches in titles, descriptions and topics:

nf-core list rna rna-seq

You can sort the results by latest release (-s release, default), when you last pulled a local copy (-s pulled), alphabetically (-s name), or number of GitHub stars (-s stars).

nf-core list -s stars

To return results as JSON output for downstream use, use the --json flag.

Archived pipelines are not returned by default. To include them, use the --show_archived flag.

Launch a pipeline

Some nextflow pipelines have a considerable number of command line flags that can be used. To help with this, you can use the nf-core launch command. You can choose between a web-based graphical interface or an interactive command-line wizard tool to enter the pipeline parameters for your run. Both interfaces show documentation alongside each parameter and validate your inputs.

The tool uses the nextflow_schema.json file from a pipeline to give parameter descriptions, defaults and grouping. If no file for the pipeline is found, one will be automatically generated at runtime.

Nextflow params variables are saved in to a JSON file called nf-params.json and used by nextflow with the -params-file flag. This makes it easier to reuse these in the future.

The command takes one argument - either the name of an nf-core pipeline which will be pulled automatically, or the path to a directory containing a Nextflow pipeline (can be any pipeline, doesn't have to be nf-core).

nf-core launch rnaseq -r 3.8.1

Once complete, the wizard will ask you if you want to launch the Nextflow run. If not, you can copy and paste the Nextflow command with the nf-params.json file of your inputs.

INFO     [✓] Input parameters look valid
INFO     Nextflow command:
         nextflow run nf-core/rnaseq -params-file "nf-params.json"


Do you want to run this command now?  [y/n]:

Launch tool options

  • -r, --revision
    • Specify a pipeline release (or branch / git commit sha) of the project to run
  • -i, --id
    • You can use the web GUI for nf-core pipelines by clicking "Launch" on the website. Once filled in you will be given an ID to use with this command which is used to retrieve your inputs.
  • -c, --command-only
    • If you prefer not to save your inputs in a JSON file and use -params-file, this option will specify all entered params directly in the nextflow command.
  • -p, --params-in PATH
    • To use values entered in a previous pipeline run, you can supply the nf-params.json file previously generated.
    • This will overwrite the pipeline schema defaults before the wizard is launched.
  • -o, --params-out PATH
    • Path to save parameters JSON file to. (Default: nf-params.json)
  • -a, --save-all
    • Without this option the pipeline will ignore any values that match the pipeline schema defaults.
    • This option saves all parameters found to the JSON file.
  • -h, --show-hidden
    • A pipeline JSON schema can define some parameters as 'hidden' if they are rarely used or for internal pipeline use only.
    • This option forces the wizard to show all parameters, including those labelled as 'hidden'.
  • --url
    • Change the URL used for the graphical interface, useful for development work on the website.

Create a parameter file

Sometimes it is easier to manually edit a parameter file than to use the web interface or interactive commandline wizard provided by nf-core launch, for example when running a pipeline with many options on a remote server without a graphical interface.

You can create a parameter file with all parameters of a pipeline with the nf-core create-params-file command. This file can then be passed to nextflow with the -params-file flag.

This command takes one argument - either the name of a nf-core pipeline which will be pulled automatically, or the path to a directory containing a Nextflow pipeline (can be any pipeline, doesn't have to be nf-core).

The generated YAML file contains all parameters set to the pipeline default value along with their description in comments. This template can then be used by uncommenting and modifying the value of parameters you want to pass to a pipline run.

Hidden options are not included by default, but can be included using the -x/--show-hidden flag.

Downloading pipelines for offline use

Sometimes you may need to run an nf-core pipeline on a server or HPC system that has no internet connection. In this case you will need to fetch the pipeline files first, then manually transfer them to your system.

To make this process easier and ensure accurate retrieval of correctly versioned code and software containers, we have written a download helper tool.

The nf-core download command will download both the pipeline code and the institutional nf-core/configs files. It can also optionally download any singularity image files that are required.

If run without any arguments, the download tool will interactively prompt you for the required information. Each option has a flag, if all are supplied then it will run without any user input needed.

nf-core download rnaseq -r 3.8 --outdir nf-core-rnaseq -x none -s none -d

Once downloaded, you will see something like the following file structure for the downloaded pipeline:

tree -L 2 nf-core-rnaseq/

You can run the pipeline by simply providing the directory path for the workflow folder to your nextflow run command:

nextflow run /path/to/download/nf-core-rnaseq-dev/workflow/ --input mydata.csv --outdir results  # usual parameters here

:::note If you downloaded Singularity container images, you will need to use -profile singularity or have it enabled in your config file. :::

Downloaded nf-core configs

The pipeline files are automatically updated (params.custom_config_base is set to ../configs), so that the local copy of institutional configs are available when running the pipeline. So using -profile <NAME> should work if available within nf-core/configs.

:::warning This option is not available when downloading a pipeline for use with Nextflow Tower because the application manages all configurations separately. :::

Downloading Apptainer containers

If you're using Singularity (Apptainer), the nf-core download command can also fetch the required container images for you. To do this, select singularity in the prompt or specify --container-system singularity in the command. Your archive / target output directory will then also include a separate folder singularity-containers.

The downloaded workflow files are again edited to add the following line to the end of the pipeline's nextflow.config file:

singularity.cacheDir = "${projectDir}/../singularity-images/"

This tells Nextflow to use the singularity-containers directory relative to the workflow for the singularity image cache directory. All images should be downloaded there, so Nextflow will use them instead of trying to pull from the internet.

Singularity cache directory

We highly recommend setting the $NXF_SINGULARITY_CACHEDIR environment variable on your system, even if that is a different system to where you will be running Nextflow.

If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory. Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts.

If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to only use the cache via a prompt or cli options --container-cache-utilisation amend. This instructs nf-core download to fetch all Singularity images to the $NXF_SINGULARITY_CACHEDIR directory but does not copy them to the workflow archive / directory. The workflow config file is not edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.

If you are downloading a workflow for a different system, you can provide information about the contents of its image cache to nf-core download. To avoid unnecessary container image downloads, choose --container-cache-utilisation remote and provide a list of already available images as plain text file to --container-cache-index my_list_of_remotely_available_images.txt. To generate this list on the remote system, run find $NXF_SINGULARITY_CACHEDIR -name "*.img" > my_list_of_remotely_available_images.txt. The tool will then only download and copy images into your output directory, which are missing on the remote system.

How the Singularity image downloads work

The Singularity image download finds containers using two methods:

  1. It runs nextflow config on the downloaded workflow to look for a process.container statement for the whole pipeline. This is the typical method used for DSL1 pipelines.
  2. It scrapes any files it finds with a .nf file extension in the workflow modules directory for lines that look like container = "xxx". This is the typical method for DSL2 pipelines, which have one container per process.

Some DSL2 modules have container addresses for docker (eg. biocontainers/fastqc:0.11.9--0) and also URLs for direct downloads of a Singularity container (eg. https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0). Where both are found, the download URL is preferred.

Once a full list of containers is found, they are processed in the following order:

  1. If the target image already exists, nothing is done (eg. with $NXF_SINGULARITY_CACHEDIR and --container-cache-utilisation amend specified)
  2. If found in $NXF_SINGULARITY_CACHEDIR and --container-cache-utilisation copy is specified, they are copied to the output directory
  3. If they start with http they are downloaded directly within Python (default 4 at a time, you can customise this with --parallel-downloads)
  4. If they look like a Docker image name, they are fetched using a singularity pull command. Choose the container libraries (registries) queried by providing one or multiple --container-library parameter(s). For example, if you call nf-core download with -l quay.io -l ghcr.io -l docker.io, every image will be pulled from quay.io unless an error is encountered. Subsequently, ghcr.io and then docker.io will be queried for any image that has failed before.
    • This requires Singularity/Apptainer to be installed on the system and is substantially slower

Note that compressing many GBs of binary files can be slow, so specifying --compress none is recommended when downloading Singularity images that are copied to the output directory.

If the download speeds are much slower than your internet connection is capable of, you can set --parallel-downloads to a large number to download loads of images at once.

Adapting downloads to Nextflow Tower

seqeralabs® Nextflow Tower provides a graphical user interface to oversee pipeline runs, gather statistics and configure compute resources. While pipelines added to Tower are preferably hosted at a Git service, providing them as disconnected, self-reliant repositories is also possible for premises with restricted network access. Choosing the --tower flag will download the pipeline in an appropriate form.

Subsequently, the *.git folder can be moved to it's final destination and linked with a pipeline in Tower using the file:/ prefix.

:::tip Also without access to Tower, pipelines downloaded with the --tower flag can be run: nextflow run -r 2.5 file:/path/to/pipelinedownload.git. Downloads in this format allow you to include multiple revisions of a pipeline in a single file, but require that the revision (e.g. -r 2.5) is always explicitly specified. :::

Pipeline software licences

Sometimes it's useful to see the software licences of the tools used in a pipeline. You can use the licences subcommand to fetch and print the software licence from each conda / PyPI package used in an nf-core pipeline.

:::warning This command does not currently work for newer DSL2 pipelines. This will hopefully be addressed soon. :::

nf-core licences deepvariant

Creating a new pipeline

The create subcommand makes a new pipeline using the nf-core base template. With a given pipeline name, description and author, it makes a starter pipeline which follows nf-core best practices.

After creating the files, the command initialises the folder as a git repository and makes an initial commit. This first "vanilla" commit which is identical to the output from the templating tool is important, as it allows us to keep your pipeline in sync with the base template in the future. See the nf-core syncing docs for more information.

 nf-core create -n nextbigthing -d "This pipeline analyses data from the next big omics technique" -a "Big Steve" --plain

Once you have run the command, create a new empty repository on GitHub under your username (not the nf-core organisation, yet) and push the commits from your computer using the example commands in the above log. You can then continue to edit, commit and push normally as you build your pipeline.

Please see the nf-core documentation for a full walkthrough of how to create a new nf-core workflow.

:::tip As the log output says, remember to come and discuss your idea for a pipeline as early as possible! See the documentation for instructions. :::

Note that if the required arguments for nf-core create are not given, it will interactively prompt for them. If you prefer, you can supply them as command line arguments. See nf-core create --help for more information.

Customizing the creation of a pipeline

The nf-core create command comes with a number of options that allow you to customize the creation of a pipeline if you intend to not publish it as an nf-core pipeline. This can be done in two ways: by using interactive prompts, or by supplying a template.yml file using the --template-yaml <file> option. Both options allow you to specify a custom pipeline prefix to use instead of the common nf-core, as well as selecting parts of the template to be excluded during pipeline creation. The interactive prompts will guide you through the pipeline creation process. An example of a template.yml file is shown below.

name: coolpipe
description: A cool pipeline
author: me
prefix: myorg
skip:
  - github
  - ci
  - github_badges
  - igenomes
  - nf_core_configs

This will create a pipeline called coolpipe in the directory myorg-coolpipe (<prefix>-<name>) with me as the author. It will exclude all possible parts of the template:

  • github: removed all files required for GitHub hosting of the pipeline. Specifically, the .github folder and .gitignore file.
  • ci: removes the GitHub continuous integration tests from the pipeline. Specifically, the .github/workflows/ folder.
  • github_badges: removes GitHub badges from the README.md file.
  • igenomes: removes pipeline options related to iGenomes. Including the conf/igenomes.config file and all references to it.
  • nf_core_configs: excludes nf_core/configs repository options, which make multiple config profiles for various institutional clusters available.

To run the pipeline creation silently (i.e. without any prompts) with the nf-core template, you can use the --plain option.

Linting a workflow

The lint subcommand checks a given pipeline for all nf-core community guidelines. This is the same test that is used on the automated continuous integration tests.

For example, the current version looks something like this:

nf-core lint

You can use the -k / --key flag to run only named tests for faster debugging, eg: nf-core lint -k files_exist -k files_unchanged. The nf-core lint command lints the current working directory by default, to specify another directory you can use --dir <directory>.

Linting documentation

Each test result name on the left is a terminal hyperlink. In most terminals you can ctrl + click ( cmd + click) these links to open documentation specific to this test in your browser.

Alternatively visit https://nf-co.re/tools-docs/lint_tests/index.html and find your test to read more.

Linting config

It's sometimes desirable to disable certain lint tests, especially if you're using nf-core/tools with your own pipeline that is outside of nf-core.

To help with this, you can add a tools config file to your pipeline called .nf-core.yml in the pipeline root directory (previously: .nf-core-lint.yml). Here you can list the names of any tests that you would like to disable and set them to False, for example:

lint:
  actions_awsfulltest: False
  pipeline_todos: False

Some lint tests allow greater granularity, for example skipping a test only for a specific file. This is documented in the test-specific docs but generally involves passing a list, for example:

lint:
  files_exist:
    - CODE_OF_CONDUCT.md
  files_unchanged:
    - assets/email_template.html
    - CODE_OF_CONDUCT.md

Note that you have to list all configurations for the nf-core lint command under the lint: field in the .nf-core.yml file, as this file is also used for configuration of other commands.

Automatically fix errors

Some lint tests can try to automatically fix any issues they find. To enable this functionality, use the --fix flag. The pipeline must be a git repository with no uncommitted changes for this to work. This is so that any automated changes can then be reviewed and undone (git checkout .) if you disagree.

Lint results output

The output from nf-core lint is designed to be viewed on the command line and is deliberately succinct. You can view all passed tests with --show-passed or generate JSON / markdown results with the --json and --markdown flags.

Pipeline schema

nf-core pipelines have a nextflow_schema.json file in their root which describes the different parameters used by the workflow. These files allow automated validation of inputs when running the pipeline, are used to generate command line help and can be used to build interfaces to launch pipelines. Pipeline schema files are built according to the JSONSchema specification (Draft 7).

To help developers working with pipeline schema, nf-core tools has three schema sub-commands:

  • nf-core schema validate
  • nf-core schema build
  • nf-core schema docs
  • nf-core schema lint

Validate pipeline parameters

Nextflow can take input parameters in a JSON or YAML file when running a pipeline using the -params-file option. This command validates such a file against the pipeline schema.

Usage is nf-core schema validate <pipeline> <parameter file>. eg with the pipeline downloaded above, you can run:

nf-core schema validate nf-core-rnaseq/3_8 nf-params.json

The pipeline option can be a directory containing a pipeline, a path to a schema file or the name of an nf-core pipeline (which will be downloaded using nextflow pull).

Build a pipeline schema

Manually building JSONSchema documents is not trivial and can be very error prone. Instead, the nf-core schema build command collects your pipeline parameters and gives interactive prompts about any missing or unexpected params. If no existing schema is found it will create one for you.

Once built, the tool can send the schema to the nf-core website so that you can use a graphical interface to organise and fill in the schema. The tool checks the status of your schema on the website and once complete, saves your changes locally.

Usage is nf-core schema build -d <pipeline_directory>, eg:

nf-core schema build --no-prompts

There are four flags that you can use with this command:

  • --dir <pipeline_dir>: Specify a pipeline directory other than the current working directory
  • --no-prompts: Make changes without prompting for confirmation each time. Does not launch web tool.
  • --web-only: Skips comparison of the schema against the pipeline parameters and only launches the web tool.
  • --url <web_address>: Supply a custom URL for the online tool. Useful when testing locally.

Display the documentation for a pipeline schema

To get an impression about the current pipeline schema you can display the content of the nextflow_schema.json with nf-core schema docs <pipeline-schema>. This will print the content of your schema in Markdown format to the standard output.

There are four flags that you can use with this command:

  • --output <filename>: Output filename. Defaults to standard out.
  • --format [markdown|html]: Format to output docs in.
  • --force: Overwrite existing files
  • --columns <columns_list>: CSV list of columns to include in the parameter tables

Add new parameters to the pipeline schema

If you want to add a parameter to the schema, you first have to add the parameter and its default value to the nextflow.config file with the params scope. Afterwards, you run the command nf-core schema build to add the parameters to your schema and open the graphical interface to easily modify the schema.

The graphical interface is oganzised in groups and within the groups the single parameters are stored. For a better overview you can collapse all groups with the Collapse groups button, then your new parameters will be the only remaining one at the bottom of the page. Now you can either create a new group with the Add group button or drag and drop the paramters in an existing group. Therefor the group has to be expanded. The group title will be displayed, if you run your pipeline with the --help flag and its description apears on the parameter page of your pipeline.

Now you can start to change the parameter itself. The ID of a new parameter should be defined in small letters without whitespaces. The description is a short free text explanation about the parameter, that appears if you run your pipeline with the --help flag. By clicking on the dictionary icon you can add a longer explanation for the parameter page of your pipeline. Usually, they contain a small paragraph about the parameter settings or a used datasource, like databases or references. If you want to specify some conditions for your parameter, like the file extension, you can use the nut icon to open the settings. This menu depends on the type you assigned to your parameter. For integers you can define a min and max value, and for strings the file extension can be specified.

The type field is one of the most important points in your pipeline schema, since it defines the datatype of your input and how it will be interpreted. This allows extensive testing prior to starting the pipeline.

The basic datatypes for a pipeline schema are:

  • string
  • number
  • integer
  • boolean

For the string type you have three different options in the settings (nut icon): enumerated values, pattern and format. The first option, enumerated values, allows you to specify a list of specific input values. The list has to be separated with a pipe. The pattern and format settings can depend on each other. The format has to be either a directory or a file path. Depending on the format setting selected, specifying the pattern setting can be the most efficient and time saving option, especially for file paths. The number and integer types share the same settings. Similarly to string, there is an enumerated values option with the possibility of specifying a min and max value. For the boolean there is no further settings and the default value is usually false. The boolean value can be switched to true by adding the flag to the command. This parameter type is often used to skip specific sections of a pipeline.

After filling the schema, click on the Finished button in the top right corner, this will automatically update your nextflow_schema.json. If this is not working, the schema can be copied from the graphical interface and pasted in your nextflow_schema.json file.

Update existing pipeline schema

It's important to change the default value of a parameter in the nextflow.config file first and then in the pipeline schema, because the value in the config file overwrites the value in the pipeline schema. To change any other parameter use nf-core schema build --web-only to open the graphical interface without rebuilding the pipeline schema. Now, the parameters can be changed as mentioned above but keep in mind that changing the parameter datatype depends on the default value specified in the nextflow.config file.

Linting a pipeline schema

The pipeline schema is linted as part of the main pipeline nf-core lint command, however sometimes it can be useful to quickly check the syntax of the JSONSchema without running a full lint run.

Usage is nf-core schema lint <schema> (defaulting to nextflow_schema.json), eg:

nf-core schema lint

Bumping a pipeline version number

When releasing a new version of a nf-core pipeline, version numbers have to be updated in several different places. The helper command nf-core bump-version automates this for you to avoid manual errors (and frustration!).

The command uses results from the linting process, so will only work with workflows that pass these tests.

Usage is nf-core bump-version <new_version>, eg:

nf-core bump-version 1.1

You can change the directory from the current working directory by specifying --dir <pipeline_dir>. To change the required version of Nextflow instead of the pipeline version number, use the flag --nextflow.

Sync a pipeline with the template

Over time, the main nf-core pipeline template is updated. To keep all nf-core pipelines up to date, we synchronise these updates automatically when new versions of nf-core/tools are released. This is done by maintaining a special TEMPLATE branch, containing a vanilla copy of the nf-core template with only the variables used when it first ran (name, description etc.). This branch is updated and a pull-request can be made with just the updates from the main template code.

Note that pipeline synchronisation happens automatically each time nf-core/tools is released, creating an automated pull-request on each pipeline. As such, you do not normally need to run this command yourself!

This command takes a pipeline directory and attempts to run this synchronisation. Usage is nf-core sync, eg:

nf-core sync

The sync command tries to check out the TEMPLATE branch from the origin remote or an existing local branch called TEMPLATE. It will fail if it cannot do either of these things. The nf-core create command should make this template automatically when you first start your pipeline. Please see the nf-core website sync documentation if you have difficulties.

To specify a directory to sync other than the current working directory, use the --dir <pipline_dir>.

By default, the tool will collect workflow variables from the current branch in your pipeline directory. You can supply the --from-branch flag to specific a different branch.

Finally, if you give the --pull-request flag, the command will push any changes to the remote and attempt to create a pull request using the GitHub API. The GitHub username and repository name will be fetched from the remote url (see git remote -v | grep origin), or can be supplied with --username and --github-repository.

To create the pull request, a personal access token is required for API authentication. These can be created at https://github.com/settings/tokens. Supply this using the --auth-token flag.

Modules

With the advent of Nextflow DSL2, we are creating a centralised repository of modules. These are software tool process definitions that can be imported into any pipeline. This allows multiple pipelines to use the same code for share tools and gives a greater degree of granulairy and unit testing.

The nf-core DSL2 modules repository is at https://github.com/nf-core/modules

Custom remote modules

The modules supercommand comes with two flags for specifying a custom remote:

  • --git-remote <git remote url>: Specify the repository from which the modules should be fetched as a git URL. Defaults to the github repository of nf-core/modules.
  • --branch <branch name>: Specify the branch from which the modules should be fetched. Defaults to the default branch of your repository.

For example, if you want to install the fastqc module from the repository nf-core/modules-test hosted at gitlab.com, you can use the following command:

nf-core modules --git-remote [email protected]:nf-core/modules-test.git install fastqc

Note that a custom remote must follow a similar directory structure to that of nf-core/moduleś for the nf-core modules commands to work properly.

The directory where modules are installed will be prompted or obtained from org_path in the .nf-core.yml file if available. If your modules are located at modules/my-folder/TOOL/SUBTOOL your .nf-core.yml should have:

org_path: my-folder

Please avoid installing the same tools from two different remotes, as this can lead to further errors.

The modules commands will during initalisation try to pull changes from the remote repositories. If you want to disable this, for example due to performance reason or if you want to run the commands offline, you can use the flag --no-pull. Note however that the commands will still need to clone repositories that have previously not been used.

Private remote repositories

You can use the modules command with private remote repositories. Make sure that your local git is correctly configured with your private remote and then specify the remote the same way you would do with a public remote repository.

List modules

The nf-core modules list command provides the subcommands remote and local for listing modules installed in a remote repository and in the local pipeline respectively. Both subcommands allow to use a pattern for filtering the modules by keywords eg: nf-core modules list <subcommand> <keyword>.

List remote modules

To list all modules available on nf-core/modules, you can use nf-core modules list remote, which will print all available modules to the terminal.

nf-core modules list remote

List installed modules

To list modules installed in a local pipeline directory you can use nf-core modules list local. This will list the modules install in the current working directory by default. If you want to specify another directory, use the --dir <pipeline_dir> flag.

nf-core modules list local

Show information about a module

For quick help about how a module works, use nf-core modules info <tool>. This shows documentation about the module on the command line, similar to what's available on the nf-core website.

nf-core modules info abacas

Install modules in a pipeline

You can install modules from nf-core/modules in your pipeline using nf-core modules install. A module installed this way will be installed to the ./modules/nf-core/modules directory.

nf-core modules install abacas

You can pass the module name as an optional argument to nf-core modules install instead of using the cli prompt, eg: nf-core modules install fastqc. You can specify a pipeline directory other than the current working directory by using the --dir <pipeline dir>.

There are three additional flags that you can use when installing a module:

  • --force: Overwrite a previously installed version of the module.
  • --prompt: Select the module version using a cli prompt.
  • --sha <commit_sha>: Install the module at a specific commit.

Update modules in a pipeline

You can update modules installed from a remote repository in your pipeline using nf-core modules update.

nf-core modules update --all --no-preview

You can pass the module name as an optional argument to nf-core modules update instead of using the cli prompt, eg: nf-core modules update fastqc. You can specify a pipeline directory other than the current working directory by using the --dir <pipeline dir>.

There are five additional flags that you can use with this command:

  • --force: Reinstall module even if it appears to be up to date
  • --prompt: Select the module version using a cli prompt.
  • --sha <commit_sha>: Install the module at a specific commit from the nf-core/modules repository.
  • --preview/--no-preview: Show the diff between the installed files and the new version before installing.
  • --save-diff <filename>: Save diffs to a file instead of updating in place. The diffs can then be applied with git apply <filename>.
  • --all: Use this flag to run the command on all modules in the pipeline.

If you don't want to update certain modules or want to update them to specific versions, you can make use of the .nf-core.yml configuration file. For example, you can prevent the star/align module installed from nf-core/modules from being updated by adding the following to the .nf-core.yml file:

update:
  https://github.com/nf-core/modules.git:
    nf-core:
      star/align: False

If you want this module to be updated only to a specific version (or downgraded), you could instead specifiy the version:

update:
  https://github.com/nf-core/modules.git:
    nf-core:
      star/align: "e937c7950af70930d1f34bb961403d9d2aa81c7"

This also works at the repository level. For example, if you want to exclude all modules installed from nf-core/modules from being updated you could add:

update:
  https://github.com/nf-core/modules.git:
    nf-core: False

or if you want all modules in nf-core/modules at a specific version:

update:
  https://github.com/nf-core/modules.git:
    nf-core: "e937c7950af70930d1f34bb961403d9d2aa81c7"

Note that the module versions specified in the .nf-core.yml file has higher precedence than versions specified with the command line flags, thus aiding you in writing reproducible pipelines.

Remove a module from a pipeline

To delete a module from your pipeline, run nf-core modules remove.

nf-core modules remove abacas

You can pass the module name as an optional argument to nf-core modules remove instead of using the cli prompt, eg: nf-core modules remove fastqc. To specify the pipeline directory, use --dir <pipeline_dir>.

Create a patch file for a module

If you want to make a minor change to a locally installed module but still keep it up date with the remote version, you can create a patch file using nf-core modules patch.

nf-core modules patch fastqc

The generated patches work with nf-core modules update: when you install a new version of the module, the command tries to apply the patch automatically. The patch application fails if the new version of the module modifies the same lines as the patch. In this case, the patch new version is installed but the old patch file is preserved.

When linting a patched module, the linting command will check the validity of the patch. When running other lint tests the patch is applied in reverse, and the original files are linted.

Create a new module

This command creates a new nf-core module from the nf-core module template. This ensures that your module follows the nf-core guidelines. The template contains extensive TODO messages to walk you through the changes you need to make to the template.

You can create a new module using nf-core modules create.

This command can be used both when writing a module for the shared nf-core/modules repository, and also when creating local modules for a pipeline.

Which type of repository you are working in is detected by the repository_type flag in a .nf-core.yml file in the root directory, set to either pipeline or modules. The command will automatically look through parent directories for this file to set the root path, so that you can run the command in a subdirectory. It will start in the current working directory, or whatever is specified with --dir <directory>.

The nf-core modules create command will prompt you with the relevant questions in order to create all of the necessary module files.

cd modules && nf-core modules create fastqc --author @nf-core-bot  --label process_low --meta --force

Create a module test config file

All modules on nf-core/modules have a strict requirement of being unit tested using minimal test data. To help developers build new modules, the nf-core modules create-test-yml command automates the creation of the yaml file required to document the output file md5sum and other information generated by the testing. After you have written a minimal Nextflow script to test your module tests/modules/<tool>/<subtool>/main.nf, this command will run the tests for you and create the tests/modules/<tool>/<subtool>/test.yml file.

nf-core modules create-test-yml fastqc --no-prompts --force

Check a module against nf-core guidelines

Run the nf-core modules lint command to check modules in the current working directory (pipeline or nf-core/modules clone) against nf-core guidelines.

Use the --all flag to run linting on all modules found. Use --dir <pipeline_dir> to specify another directory than the current working directory.

nf-core modules lint multiqc

Run the tests for a module using pytest

To run unit tests of a module that you have installed or the test created by the command nf-core modules create-test-yml, you can use nf-core modules test command. This command runs the tests specified in modules/tests/software/<tool>/<subtool>/test.yml file using pytest.

:::info This command uses the pytest argument --git-aware to avoid copying the whole .git directory and files ignored by git. This means that it will only include files listed by git ls-files. Remember to commit your changes after adding a new module to add the new files to your git index. :::

You can specify the module name in the form TOOL/SUBTOOL in command line or provide it later by prompts.

nf-core modules test samtools/view --no-prompts

Bump bioconda and container versions of modules in

If you are contributing to the nf-core/modules repository and want to bump bioconda and container versions of certain modules, you can use the nf-core modules bump-versions helper tool. This will bump the bioconda version of a single or all modules to the latest version and also fetch the correct Docker and Singularity container tags.

nf-core modules bump-versions fastqc

If you don't want to update certain modules or want to update them to specific versions, you can make use of the .nf-core.yml configuration file. For example, you can prevent the star/align module from being updated by adding the following to the .nf-core.yml file:

bump-versions:
  star/align: False

If you want this module to be updated only to a specific version (or downgraded), you could instead specifiy the version:

bump-versions:
  star/align: "2.6.1d"

Subworkflows

After the launch of nf-core modules, we can provide now also nf-core subworkflows to fully utilize the power of DSL2 modularization. Subworkflows are chains of multiple module definitions that can be imported into any pipeline. This allows multiple pipelines to use the same code for a the same tasks, and gives a greater degree of reusability and unit testing.

To allow us to test modules and subworkflows together we put the nf-core DSL2 subworkflows into the subworkflows directory of the modules repository is at https://github.com/nf-core/modules.

Custom remote subworkflows

The subworkflows supercommand released in nf-core/tools version 2.7 comes with two flags for specifying a custom remote repository:

  • --git-remote <git remote url>: Specify the repository from which the subworkflows should be fetched as a git URL. Defaults to the github repository of nf-core/modules.
  • --branch <branch name>: Specify the branch from which the subworkflows should be fetched. Defaults to the default branch of your repository.

For example, if you want to install the bam_stats_samtools subworkflow from the repository nf-core/modules-test hosted at gitlab.com in the branch subworkflows, you can use the following command:

nf-core subworkflows --git-remote [email protected]:nf-core/modules-test.git --branch subworkflows install bam_stats_samtools

Note that a custom remote must follow a similar directory structure to that of nf-core/modules for the nf-core subworkflows commands to work properly.

The directory where subworkflows are installed will be prompted or obtained from org_path in the .nf-core.yml file if available. If your subworkflows are located at subworkflows/my-folder/SUBWORKFLOW_NAME your .nf-core.yml file should have:

org_path: my-folder

Please avoid installing the same tools from two different remotes, as this can lead to further errors.

The subworkflows commands will during initalisation try to pull changes from the remote repositories. If you want to disable this, for example due to performance reason or if you want to run the commands offline, you can use the flag --no-pull. Note however that the commands will still need to clone repositories that have previously not been used.

Private remote repositories

You can use the subworkflows command with private remote repositories. Make sure that your local git is correctly configured with your private remote and then specify the remote the same way you would do with a public remote repository.

List subworkflows

The nf-core subworkflows list command provides the subcommands remote and local for listing subworkflows installed in a remote repository and in the local pipeline respectively. Both subcommands allow to use a pattern for filtering the subworkflows by keywords eg: nf-core subworkflows list <subworkflow_name> <keyword>.

List remote subworkflows

To list all subworkflows available on nf-core/modules, you can use nf-core subworkflows list remote, which will print all available subworkflows to the terminal.

nf-core subworkflows list remote

List installed subworkflows

To list subworkflows installed in a local pipeline directory you can use nf-core subworkflows list local. This will list the subworkflows install in the current working directory by default. If you want to specify another directory, use the --dir <pipeline_dir> flag.

nf-core subworkflows list local

Show information about a subworkflow

For quick help about how a subworkflow works, use nf-core subworkflows info <subworkflow_name>. This shows documentation about the subworkflow on the command line, similar to what's available on the nf-core website.

nf-core subworkflows info bam_rseqc

Install subworkflows in a pipeline

You can install subworkflows from nf-core/modules in your pipeline using nf-core subworkflows install. A subworkflow installed this way will be installed to the ./subworkflows/nf-core directory.

nf-core subworkflows install bam_rseqc

You can pass the subworkflow name as an optional argument to nf-core subworkflows install like above or select it from a list of available subworkflows by only running nf-core subworkflows install.

There are four additional flags that you can use when installing a subworkflow:

  • --dir: Pipeline directory, the default is the current working directory.
  • --force: Overwrite a previously installed version of the subworkflow.
  • --prompt: Select the subworkflow version using a cli prompt.
  • --sha <commit_sha>: Install the subworkflow at a specific commit.

Update subworkflows in a pipeline

You can update subworkflows installed from a remote repository in your pipeline using nf-core subworkflows update.

nf-core subworkflows update --all --no-preview

You can pass the subworkflow name as an optional argument to nf-core subworkflows update like above or select it from the list of available subworkflows by only running nf-core subworkflows update.

There are six additional flags that you can use with this command:

  • --dir: Pipeline directory, the default is the current working directory.
  • --force: Reinstall subworkflow even if it appears to be up to date
  • --prompt: Select the subworkflow version using a cli prompt.
  • --sha <commit_sha>: Install the subworkflow at a specific commit from the nf-core/modules repository.
  • --preview/--no-preview: Show the diff between the installed files and the new version before installing.
  • --save-diff <filename>: Save diffs to a file instead of updating in place. The diffs can then be applied with git apply <filename>.
  • --all: Use this flag to run the command on all subworkflows in the pipeline.
  • --update-deps: Use this flag to automatically update all dependencies of a subworkflow.

If you don't want to update certain subworkflows or want to update them to specific versions, you can make use of the .nf-core.yml configuration file. For example, you can prevent the bam_rseqc subworkflow installed from nf-core/modules from being updated by adding the following to the .nf-core.yml file:

update:
  https://github.com/nf-core/modules.git:
    nf-core:
      bam_rseqc: False

If you want this subworkflow to be updated only to a specific version (or downgraded), you could instead specifiy the version:

update:
  https://github.com/nf-core/modules.git:
    nf-core:
      bam_rseqc: "36a77f7c6decf2d1fb9f639ae982bc148d6828aa"

This also works at the repository level. For example, if you want to exclude all modules and subworkflows installed from nf-core/modules from being updated you could add:

update:
  https://github.com/nf-core/modules.git:
    nf-core: False

or if you want all subworkflows in nf-core/modules at a specific version:

update:
  https://github.com/nf-core/modules.git:
    nf-core: "e937c7950af70930d1f34bb961403d9d2aa81c7"

Note that the subworkflow versions specified in the .nf-core.yml file has higher precedence than versions specified with the command line flags, thus aiding you in writing reproducible pipelines.

Remove a subworkflow from a pipeline

To delete a subworkflow from your pipeline, run nf-core subworkflows remove.

nf-core subworkflows remove bam_rseqc

You can pass the subworkflow name as an optional argument to nf-core subworkflows remove like above or select it from the list of available subworkflows by only running nf-core subworkflows remove. To specify the pipeline directory, use --dir <pipeline_dir>.

Create a new subworkflow

This command creates a new nf-core subworkflow from the nf-core subworkflow template. This ensures that your subworkflow follows the nf-core guidelines. The template contains extensive TODO messages to walk you through the changes you need to make to the template. See the subworkflow documentation for more details around creating a new subworkflow, including rules about nomenclature and a step-by-step guide.

You can create a new subworkflow using nf-core subworkflows create.

This command can be used both when writing a subworkflow for the shared nf-core/modules repository, and also when creating local subworkflows for a pipeline.

Which type of repository you are working in is detected by the repository_type flag in a .nf-core.yml file in the root directory, set to either pipeline or modules. The command will automatically look through parent directories for this file to set the root path, so that you can run the command in a subdirectory. It will start in the current working directory, or whatever is specified with --dir <directory>.

The nf-core subworkflows create command will prompt you with the relevant questions in order to create all of the necessary subworkflow files.

nf-core subworkflows create bam_stats_samtools --author @nf-core-bot --force

Create a subworkflow test config file

All subworkflows on nf-core/modules have a strict requirement of being unit tested using minimal test data. To help developers build new subworkflows, the nf-core subworkflows create-test-yml command automates the creation of the yaml file required to document the output file md5sum and other information generated by the testing. After you have written a minimal Nextflow script to test your subworkflow in /tests/subworkflow/<subworkflow_name>/main.nf, this command will run the tests for you and create the /tests/subworkflow/<tool>/<subtool>/test.yml file.

nf-core subworkflows create-test-yml bam_stats_samtools --no-prompts --force

Check a subworkflow against nf-core guidelines

Run the nf-core subworkflows lint command to check subworkflows in the current working directory (a pipeline or a clone of nf-core/modules) against nf-core guidelines.

Use the --all flag to run linting on all subworkflows found. Use --dir <pipeline_dir> to specify a different directory than the current working directory.

nf-core subworkflows lint bam_stats_samtools

Run the tests for a subworkflow using pytest

To run unit tests of a subworkflow that you have installed or the test created by the command nf-core subworkflow create-test-yml, you can use nf-core subworkflows test command. This command runs the tests specified in tests/subworkflows/<subworkflow_name>/test.yml file using pytest.

:::info This command uses the pytest argument --git-aware to avoid copying the whole .git directory and files ignored by git. This means that it will only include files listed by git ls-files. Remember to commit your changes after adding a new subworkflow to add the new files to your git index. :::

You can specify the subworkflow name in the form TOOL/SUBTOOL in command line or provide it later by prompts.

nf-core subworkflows test bam_rseqc --no-prompts

Citation

If you use nf-core tools in your work, please cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.