The Precise Basecalling of Short-Read Nanopore Sequencing

Introduction

Nucleotide modifications deviate nanopore sequencing readouts, therefore generating artifacts during the basecalling of sequence backbones. Here, we present an iterative approach to polish modification-disturbed basecalling results. We show such an approach is able to promote the basecalling accuracy of both artificially-synthesized and real-world molecules. With demonstrated efficacy and reliability, we exploit the approach to precisely basecall therapeutic RNAs consisting of artificial or natural modifications, as the basis for quantifying the purity and integrity of vaccine mRNAs which are transcribed in vitro, and for determining modification hotspots of novel therapeutic RNA interference (RNAi) molecules which are bioengineered (BioRNA) in vivo.

Major Contribution: 3-step sampling

Our study shows that compromised basecalling can be improved through an iterative workflow. To enhance polishing at the 3’ and 5’ ends, which is crucial for short reads, we developed a 3-step sampling strategy. Reads are sampled from the 5’ end, the full molecule, and the 3’ end, ensuring even coverage and better basecalling at both termini.

Installation

Pre-request

Step 1: git clone

git clone https://github.com/wangziyuan66/iterative-labeling-toolkit-bonito

Step 2: build sif file

singularity build bonito.sif bonito.recipe

Create a standalone envrionment to run the iterative-labeling-bonito.

Usage

# raw=$1

# reference=$2

# bonito=$3

# basecall=$4

bash scripts/sr_iterative_labelling.sh raw reference ./ ./scripts/sr_basecall.py

raw : The path to folder containing raw pod5 files.
reference : Reference genome path.
bonito : The path to the bonito singularity image.
basecall : The path to the "basecall.py" file.

Miscellaneous

If the sequencing kit is RNA002, we recommend you to use [iterative-labeling-toolkit-taiyaki(https://github.com/wangziyuan66/iterative-labeling-toolkit-taiyaki). Currently, for the first round of basecalling we are using RNA004 hac 5.0.0 model.

Data availability

Sample raw pod5 files are provided for BioRNA-Leu BioRNA-Ser ChemoRNA-Leu and ChemoRAN-Ser which you can downloaded in here. In sra, only bam can be uploaded. If some need more rawdata, contact us.

Contact

Ziyuan Wang [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
example/bioRNA		example/bioRNA
scripts		scripts
README.md		README.md
bonito.recipe		bonito.recipe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Precise Basecalling of Short-Read Nanopore Sequencing

Introduction

Major Contribution: 3-step sampling

Installation

Pre-request

Step 1: git clone

Step 2: build sif file

Usage

Miscellaneous

Data availability

Contact

About

Releases

Packages

Contributors 2

Languages

wangziyuan66/iterative-labeling-toolkit-bonito

Folders and files

Latest commit

History

Repository files navigation

The Precise Basecalling of Short-Read Nanopore Sequencing

Introduction

Major Contribution: 3-step sampling

Installation

Pre-request

Step 1: git clone

Step 2: build sif file

Usage

Miscellaneous

Data availability

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages