-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU utilization lower than expected #1171
Comments
Hi @karlkashofer, Dorado does not simply create a runner per core - it attempts to determine how much RAM is available and generates a number of runners such that it shouldn't run out of memory. For the HAC model with a 128 batch size, this should be ~1 runner per 4.5GB of available memory. However, it looks like dorado is only checking for "free" RAM and is not allowing itself access to any buffer/cache memory (at least on linux), even though this could be made accessible. We'll investigate further and look into improving this. |
Thanks for looking into this ! I already use the LD_PRELOAD hack to limit cpus used so we can deploy that safely on the gridengine cluster. Do you think there is a similar override for the RAM ? #567 (comment) I'd be happy to test any suggestions. |
Hi @karlkashofer, If I run
But running
Given the numbers above, this suggests we're seeing memory in the It does appear to be possible to hack this in a similar way to
This generated 22 runners for me (100/4.5). Change |
Hi @malton-ont ! This is very hacky but works like a charm ! For reference, this is the relevant portion of my script:
|
We have several cluster servers with 96 CPUs and we would like to do dorado basecalling on them.
When we put dorado in a debian docker container and call it from within, it seems to wrongly detect the number of CPUs as it is running on a single CPU. nproc inside the container gives the correct number of CPUs so i am at a loss what could be wrong.
Running the same dorado binary outside docker correctly utilizes all CPUs.
Steps to reproduce the issue:
run dorado in a debian docker image.
Our docker command (the basecaller script just runs dorado and then processes the ubams):
docker run -d -v $PWD:/analysis ontsplit bash -c "cd /analysis; /basecaller.sh FBA23844_efef091c_d25addcf_1.pod5 "341, 342, 343, 344, 345, 346, 347, 348, 349, 350""
Run environment:
Logs
[2024-12-10 22:46:50.091] [info] Running: "basecaller" "-v" "--device" "cpu" "hac" "FBA23844_efef091c_d25addcf_1.pod5"
[2024-12-10 22:46:50.114] [info] - downloading [email protected] with httplib
[2024-12-10 22:46:51.260] [info] Normalised: chunksize 10000 -> 9996
[2024-12-10 22:46:51.260] [info] Normalised: overlap 500 -> 498
[2024-12-10 22:46:51.260] [info] > Creating basecall pipeline
[2024-12-10 22:46:51.260] [debug] CRFModelConfig { qscale:1.050000 qbias:-0.600000 stride:6 bias:0 clamp:1 out_features:-1 state_len:4 outsize:1024 blank_score:2.000000 scale:1.000000 num_features:1 sample_rate:5000 sample_type:DNA mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:93.692398 stdev:23.506744}} BasecallerParams { chunk_size:9996 overlap:498 batch_size:128} convs: { 0: ConvParams { insize:1 size:16 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:16 size:16 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:16 size:384 winlen:19 stride:6 activation:tanh}} model_type: lstm { bias:0 outsize:1024 blank_score:2.000000 scale:1.000000}}
[2024-12-10 22:46:51.262] [debug] - CPU calling: set num_cpu_runners to 1
[2024-12-10 22:46:51.454] [debug] BasecallerNode chunk size 9996
[2024-12-10 22:46:51.469] [debug] Load reads from file FBA23844_efef091c_d25addcf_1.pod5
The text was updated successfully, but these errors were encountered: