Hemichannel Test: Memory Issues (After the Chunking Update) #71

amelie-iska · 2024-11-29T23:31:41Z

Hi all, just ran into this error on a hemichannel (6 connexin) system (same as before). I can run this prediction with ColabFold, but not with Boltz-1.
YAML Input:

version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: [A,B,C,D,E,F]
      sequence: MGDWSALGRLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFVCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSTPTLLYLAHVFYLMRKEEKLNRKEEELKMVQNEGGNVDMHLKQIEIKKFKYGLEEHGKVKMRGGLLRTYIISILFKSVFEVGFIIIQWYMYGFSLSAIYTCKRDPCPHQVDCFLSRPTEKTIFIWFMLIVSIVSLALNIIELFYVTYKSIKDGIKGKKDPFSATNDAVISGKECGSPKYAYFNGCSSPTAPMSPPGYKLVTGERNPSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNTHAQPFDFSDEHQNTKKMAPGHEMQPLTILDQRPSSRASSHASSRPRPDDLEI
  - protein:
      id: [G,H,I]
      sequence: FSLESERP
  - ligand:
      id: [J,K,L]
      smiles: CC(C)C[C@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CCC[C@H]1C(=O)O

Run Command:

boltz predict examples/connexin-peptide.yaml --recycling_steps 20 --diffusion_samples 5 --use_msa_server

Output:

(boltz-1) lily@il-gpu04:~/amelie/Workspace/boltz$ boltz predict examples/connexin-peptide.yaml --recycling_steps 20 --diffusion_samples 5 --use_msa_server
Downloading the model weights to /home/lily/.boltz/boltz1_conf.ckpt. You may change the cache directory with the --cache flag.
Checking input data.
Running predictions for 1 structure
Processing input data.
  0%|                                                                                                                                             | 0/1 [00:00<?, ?it/s]Generating MSA for examples/connexin-peptide.yaml with 2 protein entities.
COMPLETE: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [elapsed: 00:01 remaining: 00:00]
COMPLETE: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [elapsed: 00:00 remaining: 00:00Sleeping for 8s. Reason: PENDING                                                                                                    | 0/300 [elapsed: 00:00 remaining: ?]
                                                                                                                                                                       Sleeping for 7s. Reason: RUNNING                                                                                                | 8/300 [elapsed: 00:09 remaining: 05:33]
                                                                                                                                                                       Sleeping for 9s. Reason: RUNNING                                                                                               | 15/300 [elapsed: 00:16 remaining: 05:15]
                                                                                                                                                                       Sleeping for 9s. Reason: RUNNING                                                                                               | 24/300 [elapsed: 00:26 remaining: 04:59]
                                                                                                                                                                       Sleeping for 8s. Reason: RUNNING                                                                                               | 33/300 [elapsed: 00:35 remaining: 04:47]
COMPLETE: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 300/300 [elapsed: 00:45 remaining: 00:00]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:48<00:00, 48.80s/it]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
/home/lily/mambaforge/envs/boltz-1/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:75: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
Predicting DataLoader 0:   0%|                                                                                                                    | 0/1 [00:00<?, ?it/s]| WARNING: ran out of memory, skipping batch
Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [2:34:20<00:00,  0.00it/s]Number of failed examples: 1
Predicting DataLoader 0: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [2:34:20<00:00,  0.00it/s]
(boltz-1) lily@il-gpu04:~/amelie/Workspace/boltz$

The text was updated successfully, but these errors were encountered:

xinyu-dev · 2024-12-03T01:53:56Z

Same here

YogBN · 2024-12-03T16:06:27Z

same issues with OOM.

YaoYinYing · 2024-12-04T09:53:33Z

same issue here, OOM with 2-chain protein complex(<500aa in total) on A100 (40 GB)

jwohlwend · 2024-12-04T21:16:30Z

We just released v0.3.2 which should address some of these issues. You can update with pip install boltz -U When testing, please remove any existing output folder for your input and run again! Please let us know.

YaoYinYing · 2024-12-05T01:15:39Z

v0.3.2 works for my case!!!

amelie-iska · 2024-12-05T20:05:14Z

IT WORKED!!! 🔥 🔥 🔥

amelie-iska · 2024-12-05T20:22:32Z

I did have to still truncate the last ~140 residues from the C-terminus of the connexins though. So, I ran with this YAML

version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: [A,B,C,D,E,F]
      sequence: MGDWSALGRLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFVCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSTPTLLYLAHVFYLMRKEEKLNRKEEELKMVQNEGGNVDMHLKQIEIKKFKYGLEEHGKVKMRGGLLRTYIISILFKSVFEVGFIIIQWYMYGFSLSAIYTCKRDPCPHQVDCFLSRPTEKTIFIWFMLIVSIVSLALNIIELFYVTYKSIKDG

# Long disordered C-terminal tail of connexin
# IKGKKDPFSATNDAVISGKECGSPKYAYFNGCSSPTAPMSPPGYKLVTGERNPSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNTHAQPFDFSDEHQNTKKMAPGHEMQPLTILDQRPSSRASSHASSRPRPDDLEI

# Run command: 
# boltz predict examples/connexin-peptide.yaml --recycling_steps 20  --diffusion_samples 10 --use_msa_server

Also, I am alleviating memory issues by adding this code (below) to src/boltz/main.py...will this help?

import torch
torch.set_float32_matmul_precision('medium')

I'm rerunning with the full 379 residue connexins now and will report back with an update once it either finishes or fails.

amelie-iska · 2024-12-05T22:56:23Z

😔

zongmingchua · 2024-12-18T19:19:37Z

I did have to still truncate the last ~140 residues from the C-terminus of the connexins though. So, I ran with this YAML
version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: [A,B,C,D,E,F]
      sequence: MGDWSALGRLLDKVQAYSTAGGKVWLSVLFIFRILLLGTAVESAWGDEQSAFVCNTQQPGCENVCYDKSFPISHVRFWVLQIIFVSTPTLLYLAHVFYLMRKEEKLNRKEEELKMVQNEGGNVDMHLKQIEIKKFKYGLEEHGKVKMRGGLLRTYIISILFKSVFEVGFIIIQWYMYGFSLSAIYTCKRDPCPHQVDCFLSRPTEKTIFIWFMLIVSIVSLALNIIELFYVTYKSIKDG

# Long disordered C-terminal tail of connexin
# IKGKKDPFSATNDAVISGKECGSPKYAYFNGCSSPTAPMSPPGYKLVTGERNPSSCRNYNKQASEQNWANYSAEQNRMGQAGSTISNTHAQPFDFSDEHQNTKKMAPGHEMQPLTILDQRPSSRASSHASSRPRPDDLEI

# Run command: 
# boltz predict examples/connexin-peptide.yaml --recycling_steps 20  --diffusion_samples 10 --use_msa_server
Also, I am alleviating memory issues by adding this code (below) to src/boltz/main.py...will this help?
import torch
torch.set_float32_matmul_precision('medium')
I'm rerunning with the full 379 residue connexins now and will report back with an update once it either finishes or fails.

hi! curious about your reason for using --recycling_steps 20 --diffusion_samples 10 - do the results work better compared to the default parameters?

amelie-iska · 2024-12-18T22:50:41Z

Hi @zongmingchua
In general, you can expect that raising the recycles will improve output prediction quality. Increasing the number of seeds/samples also increases your chances of getting a good prediction. So, for larger, more complex systems, I generally do not use the default settings. Another thing you might try is increasing the number of timesteps used in the diffusion process, which should also improve quality. All of these things will increase the amount of time it takes to run though. So just keep that in mind.

amelie-iska mentioned this issue Nov 30, 2024

Problems Running on a 4xA100 (80 GB) node... #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hemichannel Test: Memory Issues (After the Chunking Update) #71

Hemichannel Test: Memory Issues (After the Chunking Update) #71

amelie-iska commented Nov 29, 2024 •

edited

Loading

xinyu-dev commented Dec 3, 2024

YogBN commented Dec 3, 2024

YaoYinYing commented Dec 4, 2024

jwohlwend commented Dec 4, 2024

YaoYinYing commented Dec 5, 2024

amelie-iska commented Dec 5, 2024

amelie-iska commented Dec 5, 2024 •

edited

Loading

amelie-iska commented Dec 5, 2024

zongmingchua commented Dec 18, 2024

amelie-iska commented Dec 18, 2024 •

edited

Loading

Hemichannel Test: Memory Issues (After the Chunking Update) #71

Hemichannel Test: Memory Issues (After the Chunking Update) #71

Comments

amelie-iska commented Nov 29, 2024 • edited Loading

xinyu-dev commented Dec 3, 2024

YogBN commented Dec 3, 2024

YaoYinYing commented Dec 4, 2024

jwohlwend commented Dec 4, 2024

YaoYinYing commented Dec 5, 2024

amelie-iska commented Dec 5, 2024

amelie-iska commented Dec 5, 2024 • edited Loading

amelie-iska commented Dec 5, 2024

zongmingchua commented Dec 18, 2024

amelie-iska commented Dec 18, 2024 • edited Loading

amelie-iska commented Nov 29, 2024 •

edited

Loading

amelie-iska commented Dec 5, 2024 •

edited

Loading

amelie-iska commented Dec 18, 2024 •

edited

Loading