-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Base calling speed : twice the time for base calling with Dorado v0.8.1:0.8.3 #1159
Comments
Hi @vaillan6, Do you have a record of the batch size used for each run? It would be good to rule out changes from the auto batch size detection becoming more conservative to prevent OOM issues. Best regards, |
@vaillan6,
|
We observe the same on our V100 GPUs. GPU usage dropped to 70-80% when switching dorado to version 0.8.3 from 0.7.3. Therefore we ran a test on two small pod5 files to figure it out in more detail. Pod5s were located on local SSDs of the node to rule out performance issues of the network.
As there were differences in the selected batch size, I also fixed the 0.8.3 batch size to the values of 0.7.3. Now only ~50% of the GPU memory is used (16/32GB, ~31/32GB with auto batch size), while GPU usage is back at 100% and speed seems to be the same. Interestingly, as I can only set a global limit via the -b parameter, my limited 0.8.3 duplex version runs with more batches than the unlimited one, but finisheds only slightly slower than the simplex version, while the "unlimited" duplex (with fewer batches) again only runs on ~75% of the gpu and needed > 3hrs.. Steps to reproduce the issue:run different versions of dorado on a Tesla V100 Run environment:
Logslog for simplex 0.7.3 |
Thanks for the additional data points @johannesgeibel, this does appear to be an issue with the auto batch size process as you show. We'll investigate further. Best regards, |
@HalfPhoton
|
@HalfPhoton Yes, the table has been updated. My apologies. |
@HalfPhoton Thank you, the most recent update, v0.9.0, has resolved this issue. |
Issue Report
I would like to report that with the newest versions of Dorado (v0.8.1, v0.8.2, and v0.8.3) base calling is now taking twice the time it took with previous versions.
Run environment:
Dorado version: 0.7.2, 0.8.0, 0.8.1, 0.8.2, 0.8.3
Dorado command:
example:
dorado-#.#.#-linux-x64/bin/dorado basecaller --recursive --min-qscore 10 --trim all --kit-name SQK-NBD114-24 dorado-#.#.#-linux-x64/bin/[email protected] ${FILE} > ${FILE}_Dorado_v#.#.#_alltrim_q10.bam 2> ${FILE}_Dorado_v#.#.#_alltrim_q10.stdout
Operating system:
Linux 5.14.0-427.40.1.el9_4.x86_64 x86_64
NAME="Rocky Linux
VERSION="9.4 (Blue Onyx)
Hardware (CPUs, Memory, GPUs):
GPU
SQK-NBD114.24, Native Barcoding Kit 24 V14, ~458,550 reads, FLO-MIN114, 4.39 gigabases, 17.87 kb N50, 48GB directory size
Logs
example for v0.8.3 run
[2024-11-21 09:28:02.296] [info] Running: "basecaller" "--recursive" "--min-qscore" "10" "--trim" "all" "--kit-name" "SQK-NBD114-24" "/home/brieanne/dorado-0.8.3-linux-x64/bin/[email protected]" "/data/run/brieanne/testing/PLU_AA_01_ONT"
[2024-11-21 09:28:02.458] [info] > Creating basecall pipeline
[2024-11-21 09:28:03.664] [info] cuda:0 using chunk size 12288, batch size 576
[2024-11-21 09:28:06.223] [info] cuda:0 using chunk size 6144, batch size 576
[2024-11-22 00:15:48.634] [info] > Simplex reads basecalled: 441661
[2024-11-22 00:15:48.634] [info] > Simplex reads filtered: 16979
[2024-11-22 00:15:48.634] [info] > Basecalled @ Samples/s: 1.142858e+06
[2024-11-22 00:15:48.634] [info] > 443075 reads demuxed @ classifications/s: 8.318957e+00
[2024-11-22 00:15:48.703] [info] > Finished
The text was updated successfully, but these errors were encountered: