PoreQC

An automated nextflow pipeline read basecalling, quality control and adapter removal

Brief Background

PoreQC is a Nextflow pipeline for Oxford nanopore reads (Slow5, Pod5 and Fastq). Integrating with Guppy, Dorado, Buttery-eel, Cutadapt, and Sequali, the automated pipeline can work for basecalling, quality control and removal adapters. We (Hyungtaek Jung and the National Centre for Indigenous Genomics at The Australian National University, Australia) initially started this project to provide comprehensive data management at the National Computational Infrastructure for biologists. As a command-line interface (CLI) application, we have tested it for ONT long-read data focusing on whole genome shotgun datasets that can be widely used by the greater research community. However, please note that basecalling and visualising a big dataset would require large computational resources on HPC or Cloud.

Citation

Hyungtaek Jung et al.: PoreQC: An automated nextflow pipeline for oxford nanopore read basecalling, quality control and adapter removal, Preparation for Submission.

STABLE (version 0.0.XXX)

PoreQC comprises two key features (basecalling and quality control) and four interactive steps with open-source programs (See LICENSE).

INSTALLATION

Please download the program from this link !!! Please note, that programs and dependencies can also be installed via Bioconda. For any other issues, we highly encourage users to use the Issues.

~~~
Create the virtual environment
Need to be updated

Get source
Need to be updated

Install packages
Need to be updated

Run
Need to be updated
~~~

License

PoreQC is provided under the MIT license and is based on other open-source software:

Guppy for basecalling and processing raw signal data from nanopore sequencing devices, providing accurate DNA sequence information.

Dorado for Oxford Nanopore long-read sequencing data, offering enhanced accuracy in detecting structural variants and single nucleotide variants.

Buttery-eel for a Slow5 file reader and basecalling wrapper for Guppy and Dorado.

Cutadapt for removing adapters, primers, and other unwanted sequences from high-throughput sequencing data.

Sequali for evaluating the quality of sequencing data through the generation of comprehensive metrics and visualizations.

Nextflow for a data-driven computational workflow engine designed to facilitate scalable and reproducible scientific workflows.

In-house Perl Script (fqreqdstats.pl) for calculating basic statistics of a FASTQ file in-house.

Tested Datasets

Reference genome(https://www.ncbi.nlm.nih.gov/assembly/GCF_000001735.3/#/st) Oxford Nanopore reads(https://ngdc.cncb.ac.cn/gsa/browse/CRA004538) and (https://www.sciencedirect.com/science/article/pii/S1672022921001741)

GETTING STARTED

PoreQC, integrated with Nextflow, has two specific features: a basecalling (Slow5) and a result summary and visualisation of quality control (Fastq). The data input/output enables end-to-end file selection. The result summary and visualisation are mainly designed to visualise the outcome for quality control. Please note that all required input files (e.g. Slow5) must be prepared from Slow5tools to have a seamless experience of PoreQC. However, users can use Fastq files for quick quality control.

Slow5 format:

Slow5 tools: Please see the official page of Slow5tools to make a proper Slow5 format from the ONT data.

Buttery-eel:

Requriment: The pipeline of buttery-eel requires both CPU and GPU resources.
Computing environment: Please use an HPC or Cloud to facilitate the CPU and GPU resources with a proper queue job submission.
Buttery-eel.pbs.sh: Secure the buttery-eel-guppy or butter-eel-dorado pipelines
Input: Slow5 secured from Slow5tools.
Output: Basecalled and trimmed Fastq file.
Select model: Depending on ONT library preparation and sequencing kits, users must select the proper model in the pipeline.
Module load: Users can choose two options between buttery-eel (v0.4.2) + guppy (v6.5.7) and buttery-eel (v0.4.2) + dorado (v7.2.13). Please check the page of Butter-eel Buttery-eel for the latest versions.
Default mode: This mode will do the basic basecalling with detection and removal of adapters.

Usage: Execute this command in the terminal.

qsub -v MERGED_SLOW5=/ONT_raw_data/QTXXXX230285_reads.blow5,BASECALL_OUT=/ONT_raw_data/OutFQDrdT2 ./buttery-eel_QT0285.pbs.sh

Advanced mode: This mode will do the basecalling, removal adapters and split reads.

Add parameters: Add these parameters "--detect_mid_strand_adapter --trim_adapters --detect_adapter --do_read_splitting" in the pipeline, specifically after "--max_queued_reads 20000."
Usage: Execute this command in the terminal.

qsub -v MERGED_SLOW5=/ONT_raw_data/QTXXXX230285_reads.blow5,BASECALL_OUT=/ONT_raw_data/OutFQDrdT2 ./buttery-eel_QT0285.pbs.sh

Reads Stats:

Requirement: The script of Perl/bash requires a Perl library.
Input: Fastq file generated from buttery-eel pipeline.
Output: A summary of csv file for the Fastq.
Perl script: An in-house script to calculate the basic stats of Fastq file (including compressed file format).

Usage: Execute this command in the terminal.
Mandatory parameters: --input.fq and --out para
Optional parameters: --t and --mem
Help: perl fqreadstats.pl --help

perl fqreadstats.pl --input.fq test_reads.fq.gz --out test_reads.csv --t 2 --mem 40

Cutadapt

Installation and Requirement: Please see the page of Cutadapt
Input: Fastq file generated from buttery-eel pipeline.
Output: Trimmed Fastq file.

Usage: Execute this command in the terminal.
Mandatory parameters: -g, -a, or -b (adapter sequences), -o (output directory), and input.fastq/fq (input fastq file)

cutadapt -g TTTTTTTTCCTGTACTTCGTTCAGTTACGTATTGCT -o /output_folder/ input.fastq

Sequali:

Installation and Requirement: Please see the page of Sequali
Input: Fastq file generated from buttery-eel pipeline or Cutadapt.
Output: A summary of html and json file for the Fastq.

Usage: Execute this command in the terminal.
Mandatory parameters: input.fastq/fq (input fastq file) --adapter-file (adapter sequences as .tsv), --outdir (output directory), and -t (CPU number)

sequali input.fastq --adapter-file "$ASFL" --outdir /output_folder/ -t 2

Nextflow:

Installation and Requirement: Please see the page of Nextflow
Input: Slow5 file generated from Slow5tools.
Output: Cleaned Fastq and its summary with html and json files.
Interaction: A user can indicate the input/output folder/file for their convenience.
Resume: An interrupted stage/step can be resumed via Nextflow management.

Usage: Execute this command in the terminal.
Mandatory parameters: input.fastq/fq (input fastq file) --adapter-file (adapter sequences as .tsv), --outdir (output directory), and -t (CPU number)

sequali input.fastq --adapter-file "$ASFL" --outdir /output_folder/ -t 2

FAQ

We encourage users to use the Issues.

WIKI PAGE

Please see GitHub page.

AUTHORS

Hyungtaek Jung and the National Centre for Indigenous Genomics.

COPYRIGHT

The full PoreQC is distributed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
images		images
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PoreQC

Brief Background

Citation

Contents:

STABLE (version 0.0.XXX)

INSTALLATION

License

Tested Datasets

GETTING STARTED

Slow5 format:

Buttery-eel:

Reads Stats:

Cutadapt

Sequali:

Nextflow:

FAQ

WIKI PAGE

AUTHORS

COPYRIGHT

About

Releases

Packages

Languages

OZTaekOppa/PoreQC

Folders and files

Latest commit

History

Repository files navigation

PoreQC

Brief Background

Citation

Contents:

STABLE (version 0.0.XXX)

INSTALLATION

License

Tested Datasets

GETTING STARTED

Slow5 format:

Buttery-eel:

Reads Stats:

Cutadapt

Sequali:

Nextflow:

FAQ

WIKI PAGE

AUTHORS

COPYRIGHT

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages