This is a Nextflow pipeline for generating sequencing reports for the SNP&Seq Technology platform, NGI Uppsala, SciLifelab Genomics.
You need to:
- install Nextflow (e.g. using conda
conda create -n nextflow-env nextflow
or downloading from nextflow.io). - install Singularity (version > 2.6).
Optional:
- (currently mandatory: see known issues) Download the fastq-screen database by downloading fastq-screen from here, extract the archive and then run
fastq_screen --get_genomes
.
Awesome, you're all set! Let's try generating reports for your favourite runfolder:
# Using parameters supplied in a config (see below)
nextflow run -c custom.config -profile snpseq,singularity main.nf
# Using parameters supplied on the command line
nextflow run -profile snpseq,singularity main.nf \
--run_folder '/path/to/runfolder' \
--fastqscreen_databases '/path/to/databases' \
--checkqc_config '/path/to/checkqc.config'
These are the primary config profiles:
dev
: Run locally with low memory.irma
: Uppmax slurm profile for use on the clusterirma
(note: The parameterparams.project
must be supplied).snpseq
: Run locally with greater memory available thandev
.singularity
: Enables singularity and provides container URLs.test
: Run the pipeline using test data
Additional profiles:
debug
: prints out theenv
properties before executing processes.
Custom config files can contain all command line parameters, nextflow parameters, and overriding options.
For example:
resume = true
params.run_folder = '/path/to/runfolder'
params.fastqscreen_databases = '/path/to/databases'
params.checkqc_config = '/path/to/checkqc.config'
workDir = '/path/to/temporary/storage/space'
There are two primary branches of this project:
master
: The stable release branchdev
: The development and test branch, to which pull requests should be made.
Tests are run through GitHub Actions when pushing code to the repo. See instructions below on how to reproduce it locally.
To keep the python parts of the project nice and tidy, we enforce that code should be formatted according to black. To re-format your code with black, simply run:
black .
Assuming you have installed all pre-requisites (except the fastq screen database: test data comes with a minimal version of it), you can run tests locally by following these steps:
# create virtual environment
virtualenv -p python3.9 venv/
# activate venv
source venv/bin/activate
# install dependencies
pip install -r requirements-dev.txt
# run tests
pytest tests/
# perform black formatter check
black --check .
- Unable to download genome indicies using
fastq_screen --get_genomes
as wget within the container does not resolve the address correctly. Fastq Screen must be installed separately (e.g. with conda) and the genomes downloaded prior to running the workflow. The path to the databases must then be given using theparams.fastqscreen_databases
parameter.