-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems Running on a 4xA100 (80 GB) node... #15
Comments
Hi! Would you mind checking if it passes on single GPU? I have an idea of what might be going on but just wish to confirm, thanks! |
It'll run less demanding inputs on a single GPU (the provided example runs). |
Thanks I'll investigate, but just to be clear, the multi-GPU mode will not allow you to run larger inputs, it's meant to run multiple input files (provided as a directory) in parallel, not parallelize the model it self (i.e we do not do any sharding). |
VRAM limit option then? |
Which actually now that I think about it, explains your issue. Since there is only one example, the other 3 GPUs have nothing to do. I'll add a warning around this, and make num_devices = min(num_samples num_devices) |
Yeah we need to do some measurements on this. We're going to add a chunking feature in the next day or so to allow for larger inputs at the cost of some slowdown, hopefully that helps! |
🤞 |
Is chunking feature work now? I try to generate structure for a large protein, it always raised out of memory on a A100-80G. |
The PR is open, we're just working out what the default behavior should be and will merge very soon |
The chunking code is now live in version 0.3.0! |
Still having trouble...see this new issue #71 |
We just released v0.3.2 which should address some of these issues. You can update with pip install boltz -U When testing, please remove any existing output folder for your input and run again! Please let us know. |
Input MSAs were truncated to be a single entry (duplicates of the input sequences) because leaving
msa:
blank causes errors for some reason.Input YAML:
Command:
Output:
The text was updated successfully, but these errors were encountered: