Snakemake-based pipeline for quality control reporting and assembly (both de novo and reference-based) from a MinION run.
Warning: very much in active development.
- conda
- python3
guppy
from Oxford Nanopore
conda create -n mars python=3
conda activate mars
git clone https://github.com/eclarke/mars
cd mars
pip install .
To begin, MARS needs a sample sheet in tab-delimited format with two columns: barcode
and sample_label
.
barcode
should contain only numbers between 1-96, and each barcode can only appear once.sample_label
should contain no spaces and a sample label can appear multiple times.
MARS will yell at you if you violate either of these rules. Other columns are currently ignored and can be used for whatever you want. Any lines beginning with a '#' are considered comments and will be ignored.
Technical replicates: If a sample was split among multiple barcodes, you can choose to give those barcodes the same sample label. MARS will pool the reads from all barcodes belonging to the same sample label during assembly steps.
Omitting samples: You can omit samples from downstream steps by commenting out its line in the sample sheet (useful if it won't assemble, etc)
I've made a validating sample sheet template available here. Choose File -> Make a copy... to save it to your own Drive for later use. After filling it out, choose File -> Download as... -> Tab-separated values and transfer it to the server running MARS.
Once you've created the sample sheet, use mars init
to create an empty config file then edit it using your editor of choice:
mars init -c config.yaml
nano config.yaml
The config file contains many key: value
pairs with a short description above each one.
To get started, fill out at least the output_dir
, fast5_dir
, basecaller
, and samplesheet_fp
options (making sure to remove the leading #
on each), then save.
Config Validation: Config options that end in
_dir
or_fp
must be paths to valid directories or files, respectively. MARS will resolve any relative paths against the directory it's executed from, and stop if any paths besidesoutput_dir
do not exist. In addition, some config options' values must be numbers (as noted in the config file). MARS will complain if they're not, saving you the headache of debugging some random Snakemake error down the line.
Running MARS is straightforward. Just type:
mars run config.yaml <workflow> [any Snakemake options]
where workflow
is one of:
process_all
: basecalls, demultiplexes, and quality-controls reads into separate samples.assemble_all
: assembles each sample using the specified assembler(s).polish_all
: polishes each sample's assembly using the specified polisher(s).
MARS will give you a helpful error message if any of the config values required for the workflow are not specified in the config file.
For instance, you need to specify which assemblers you want to use in order to run the assemble_all
workflow.
Snakemake options Since MARS calls Snakemake to execute each step, you can pass any Snakemake options to
mars run
and they will be transparently passed to Snakemake during execution.
- The final outputs from MARS go into
process
,assemble
, orpolish
directories inside the directory specified byoutput_dir
. - Reports (summaries, logs, etc) from each step go into similarly named directories inside the
reports
folder. - Intermediate files are stored in similarly named directories inside the
workspace
folder (which can be safely deleted when the pipeline is finished.)
- Final outputs: basecalled fastq files corresponding to each sample that have been quality-filtered and adapter-trimmed (in
process/
) - Reports: basecalling and demultiplexing summary files, overall run quality reports (in
reports/process/nanoplot
) and per-barcode quality reports (inreports/process/nanocomp
)
- Final outputs: assembled contigs from each assembler chosen (in
assemble/[assembler]/[sample]/contigs.fa
) and the assembly graph, if available (inassemble/[assembler]/contigs.gfa
) - Reports: assembly quality reports from QUAST (in
reports/assemble/[assembler]/[sample]/quast/
)