Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low Qscore after Dorado basecalling - P2 solo - ultra long reads #1239

Open
jefferalexdurfue opened this issue Jan 31, 2025 · 1 comment
Open

Comments

@jefferalexdurfue
Copy link

jefferalexdurfue commented Jan 31, 2025

Hi everyone

I have performed a P2 solo sequencing with an Ultra-Long Sequencing library (SQK-ULK114). Three separate loading events (24h each) were performed, as recommended by the kit protocol. Metrics were: estimated bases (11.75 Gb), reads generated (171.61K), and estimated N50 (596.82 kb)! Although low output in Gb, we were excited with the read N50.

But after DORADO basecalling , we have seen a very low Qscore as summarized with pycoQC and Nanoplot.

Basecall duplex script, after organizing with pod5 (split_by_channel):

#SBATCH --time=7-00:00:00
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --cpus-per-task=32
#SBATCH --mem=740G
#SBATCH --job-name=02-basecalling_duplex
#SBATCH -o ~HOME/Nanopore/log/02-basecalling_duplex.out
#SBATCH -e ~HOME/Nanopore/log/02-basecalling_duplex.err

Dir_POD5="~HOME/Nanopore/01-POD5/split_by_channel"
Dir_basecaller="~HOME/Nanopore/02-basecaller"
DORADO="~HOME/software/dorado-0.8.3-linux-x64/bin/dorado"
model="~HOME/software/dorado-0.8.3-linux-x64/model"
Dir_fastq="~HOME/Nanopore/04-fastq"

     ${DORADO} duplex --device 'cuda:all' \
            ${model}/[email protected] \
            ${Dir_POD5}/ > ${Dir_basecaller}/calls.bam
              
     samtools fastq ${Dir_basecaller}/call_duplex.bam \
            > ${Dir_fastq}/call_duplex.fastq

pycoQC, all reads, Median read quality 2,92!:

Report_Nanopore.pdf

Conclusion:
We ended up with 7Gb (already low for P2), N50 of 380 kb (!!!) and median qscore around 3.
If cut out bad reads, we get only ~148 Mb of data, N50 23 kb...
So bad result for an ultra long library and P2 flowcell.
Is it a bad flowcell? Any advice to try to improve this?

Thanks in advance.
Best regards

@HalfPhoton
Copy link
Collaborator

Hi @jefferalexdurfue,
I think this questions best asked on the Nanopore Community Forum as it doesn't appear to be a dorado issue but a sequencing one.


Taking a look at your script however:Dir_POD5="~HOME/Nanopore/01-POD5/split_by_channel"

It looks like this is calling all the split-by-channel pod5s in one job - this is not what we suggest for good performance.
You should run multiple small jobs for a collection of channels to get a performance improvements in duplex.

Best regards,
Rich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants