Ran out of memory #83

xinyu-dev · 2024-12-03T14:33:42Z

I tested the 0.3.0 with different sequences and GPU. In some cases I see ran out of memory, skipping batch issue. Here are the details:

Command:

! boltz predict input/<specific_yaml_file> --out_dir output --devices 1 --output_format pdb --use_msa_server --num_workers 2

Input 1:
keytruda.yaml:

version: 1
sequences:
  - protein:
      id: A
      sequence: EIVLTQSPATLSLSPGERATLSCRASKGVSTSGYSYLHWYQQKPGQAPRLLIYLASYLESGVPARFSGSGSGTDFTLTISSLEPEDFAVYYCQHSRDLPLTFGGGTKVEIK
  - protein:
      id: B
      sequence: VQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS

AWS SageMaker g5.8xlarge (A10-24G), p3.2xlarge(V100-16G): out of memory
AWS SageMaker g6.8xlarge (L4-24G): Good

Input 2:

adalimumab.yaml:

version: 1
sequences:
  - protein:
      id: A
      sequence: DIQMTQSPSSLSASVGDRVTITCRASQGIRNYLAWYQQKPGKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTISSLQPEDVATYYCQRYNRAPYTFGQGTKVEIK
  - protein:
      id: B
      sequence: EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLEWVSAITWNSGHIDYADSVEGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAKVSYLSTASSLDYWGQGTLVTVSS

AWS SageMaker g5.8xlarge (A10-24G), p3.2xlarge(V100-16G), g6.8xlarge (L4-24G): out of memory
DGX A100-80G: Good

input 3:
example_multimer_from_your_repo:

version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: A
      sequence: MAHHHHHHVAVDAVSFTLLQDQLQSVLDTLSEREAGVVRLRFGLTDGQPRTLDEIGQVYGVTRERIRQIESKTMSKLRHPSRSQVLRDYLDGSSGSGTPEERLLRAIFGEKA
  - protein:
      id: B
      sequence: MRYAFAAEATTCNAFWRNVDMTVTALYEVPLGVCTQDPDRWTTTPDDEAKTLCRACPRRWLCARDAVESAGAEGLWAGVVIPESGRARAFALGQLRSLAERNGYPVRDHRVSAQSA

AWS SageMaker g5.8xlarge (A10-24G)), p3.2xlarge(V100-16G), g6.8xlarge (L4-24G) and DGX A100-80G: All good

My feeling is that the main difference is in the GPU memory. e.g. growing from 24G to 80G does solve the OOM issue of some larger complexes. But the type of GPU (e.g. A10-24G vs L4-24G) might lead to different outcomes as well.

The text was updated successfully, but these errors were encountered:

YogBN · 2024-12-03T16:04:29Z

same here running into OOM issue for a multimer on an RTX4090 24GB

xinyu-dev · 2024-12-03T17:05:34Z

I also suspect there is something with the data returned by the public Colab MMseq server. Tamarind Bio's Boltz server is seemingly running on a single L4 GPU and has no issue with any of the seqs above that I tested. I don't know what mmseq tool they use though.

RJWANGbioinfo · 2024-12-04T03:52:52Z

same OOM issue, I'm using version 0.3.0 which is supposed to resolve the memory issue.....
The prediction is OOM failed on L4 GPU with 24GB memory (g6.4xlarge)

It works when there are 2 input protein sequences, but failed on 3 input protein sequences....
Each of the input sequences is ~ 110-150 aa

jwohlwend · 2024-12-04T20:01:54Z

We just released v0.3.2 which should address some of these issues. You can update with pip install boltz -U When testing, please remove any existing output folder for your input and run again! Please let us know.

jubosch · 2024-12-18T19:34:24Z

I just installed (24-12-18) via pip install the current version and I'm running into the same memory issue on a RTX 2080 Titan with 11 GB and a AMD Threadripper 64 GB. Protein A 427 aa, Protein B 135 aa , Protein C 109 aa

Using only B & C I get 9.3 GB memory usage on the GPU and it works fine. Is that amount of memory used by this task normal?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ran out of memory #83

Ran out of memory #83

xinyu-dev commented Dec 3, 2024

YogBN commented Dec 3, 2024

xinyu-dev commented Dec 3, 2024

RJWANGbioinfo commented Dec 4, 2024

jwohlwend commented Dec 4, 2024

jubosch commented Dec 18, 2024

Ran out of memory #83

Ran out of memory #83

Comments

xinyu-dev commented Dec 3, 2024

YogBN commented Dec 3, 2024

xinyu-dev commented Dec 3, 2024

RJWANGbioinfo commented Dec 4, 2024

jwohlwend commented Dec 4, 2024

jubosch commented Dec 18, 2024