Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--devices 2 failed for test case #51

Closed
jiaboli007 opened this issue Nov 23, 2024 · 7 comments
Closed

--devices 2 failed for test case #51

jiaboli007 opened this issue Nov 23, 2024 · 7 comments

Comments

@jiaboli007
Copy link

I can run the prediction of prot.fasta (the sample input in the package) with default setting:

boltz predict prot.fasta

The run finished succesfully.

However, if I run the same job with option --devices 2 (trying to use two GPUs on my machine), it failed. See the log file of the error message,
boltz.log

I am running on Ubuntu 22.04.4 LTS, with pytorch version '2.5.1+cu124'. I have installed boltz from source and updated to the latest version (0.2.1)

@eyal-converge
Copy link

Got a similar error with --device=2 (using 2xA100 40GB)
debian11
pytorch==2.5
cuda==12.4

@jwohlwend
Copy link
Owner

This happens because you are trying to use multiple devices eventhough you only have a single input, so one GPU has nothing to do and fails

@eyal-converge
Copy link

eyal-converge commented Nov 28, 2024

@jwohlwend
I assumed --device can turn on model parallelism or something

@jwohlwend
Copy link
Owner

We do not support model parallelism, devices should be used to run many predictions in parallel. I've added an error message.

@eyal-converge
Copy link

eyal-converge commented Nov 28, 2024

@jwohlwend
Can you give an example when to use --device flag ?

@jwohlwend
Copy link
Owner

Say you have a directory containing multiple yaml files:

my_directory
     input1.yaml
     input2.yaml

When you do boltz predict my_directory it will run input1.yaml and then input2.yaml. But if you have multiple GPU, say 2 in this case, you can add the flag --devices 2 in which case input1.yaml and input2.yaml will be predicted in parallel, one on each device.

@jiaboli007
Copy link
Author

Thank. I tested the current version (0.3) with the sample input ligand.fasta on my machine, it finished without failure. Many thanks for making constant improvements.

@gcorso gcorso closed this as completed Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants