--devices 2 failed for test case #51

jiaboli007 · 2024-11-23T19:27:02Z

I can run the prediction of prot.fasta (the sample input in the package) with default setting:

boltz predict prot.fasta

The run finished succesfully.

However, if I run the same job with option --devices 2 (trying to use two GPUs on my machine), it failed. See the log file of the error message,
boltz.log

I am running on Ubuntu 22.04.4 LTS, with pytorch version '2.5.1+cu124'. I have installed boltz from source and updated to the latest version (0.2.1)

eyal-converge · 2024-11-24T11:57:48Z

Got a similar error with --device=2 (using 2xA100 40GB)
debian11
pytorch==2.5
cuda==12.4

jwohlwend · 2024-11-28T07:20:31Z

This happens because you are trying to use multiple devices eventhough you only have a single input, so one GPU has nothing to do and fails

eyal-converge · 2024-11-28T07:59:50Z

@jwohlwend
I assumed --device can turn on model parallelism or something

jwohlwend · 2024-11-28T08:12:51Z

We do not support model parallelism, devices should be used to run many predictions in parallel. I've added an error message.

eyal-converge · 2024-11-28T11:50:33Z

@jwohlwend
Can you give an example when to use --device flag ?

jwohlwend · 2024-11-30T05:39:01Z

Say you have a directory containing multiple yaml files:

my_directory
     input1.yaml
     input2.yaml

When you do boltz predict my_directory it will run input1.yaml and then input2.yaml. But if you have multiple GPU, say 2 in this case, you can add the flag --devices 2 in which case input1.yaml and input2.yaml will be predicted in parallel, one on each device.

jiaboli007 · 2024-12-13T02:08:37Z

Thank. I tested the current version (0.3) with the sample input ligand.fasta on my machine, it finished without failure. Many thanks for making constant improvements.

jiaboli007 mentioned this issue Nov 23, 2024

protein length vs. GPU memory #17

Open

gcorso closed this as completed Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--devices 2 failed for test case #51

--devices 2 failed for test case #51

jiaboli007 commented Nov 23, 2024

eyal-converge commented Nov 24, 2024

jwohlwend commented Nov 28, 2024

eyal-converge commented Nov 28, 2024 •

edited

Loading

jwohlwend commented Nov 28, 2024

eyal-converge commented Nov 28, 2024 •

edited

Loading

jwohlwend commented Nov 30, 2024

jiaboli007 commented Dec 13, 2024

--devices 2 failed for test case #51

--devices 2 failed for test case #51

Comments

jiaboli007 commented Nov 23, 2024

eyal-converge commented Nov 24, 2024

jwohlwend commented Nov 28, 2024

eyal-converge commented Nov 28, 2024 • edited Loading

jwohlwend commented Nov 28, 2024

eyal-converge commented Nov 28, 2024 • edited Loading

jwohlwend commented Nov 30, 2024

jiaboli007 commented Dec 13, 2024

eyal-converge commented Nov 28, 2024 •

edited

Loading

eyal-converge commented Nov 28, 2024 •

edited

Loading