Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ran out of memory #83

Open
xinyu-dev opened this issue Dec 3, 2024 · 5 comments
Open

Ran out of memory #83

xinyu-dev opened this issue Dec 3, 2024 · 5 comments

Comments

@xinyu-dev
Copy link

I tested the 0.3.0 with different sequences and GPU. In some cases I see ran out of memory, skipping batch issue. Here are the details:

Command:

! boltz predict input/<specific_yaml_file> --out_dir output --devices 1 --output_format pdb --use_msa_server --num_workers 2

Input 1:
keytruda.yaml:

version: 1
sequences:
  - protein:
      id: A
      sequence: EIVLTQSPATLSLSPGERATLSCRASKGVSTSGYSYLHWYQQKPGQAPRLLIYLASYLESGVPARFSGSGSGTDFTLTISSLEPEDFAVYYCQHSRDLPLTFGGGTKVEIK
  - protein:
      id: B
      sequence: VQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS
  • AWS SageMaker g5.8xlarge (A10-24G), p3.2xlarge(V100-16G): out of memory
  • AWS SageMaker g6.8xlarge (L4-24G): Good

Input 2:

adalimumab.yaml:

version: 1
sequences:
  - protein:
      id: A
      sequence: DIQMTQSPSSLSASVGDRVTITCRASQGIRNYLAWYQQKPGKAPKLLIYAASTLQSGVPSRFSGSGSGTDFTLTISSLQPEDVATYYCQRYNRAPYTFGQGTKVEIK
  - protein:
      id: B
      sequence: EVQLVESGGGLVQPGRSLRLSCAASGFTFDDYAMHWVRQAPGKGLEWVSAITWNSGHIDYADSVEGRFTISRDNAKNSLYLQMNSLRAEDTAVYYCAKVSYLSTASSLDYWGQGTLVTVSS 
  • AWS SageMaker g5.8xlarge (A10-24G), p3.2xlarge(V100-16G), g6.8xlarge (L4-24G): out of memory
  • DGX A100-80G: Good

input 3:
example_multimer_from_your_repo:

version: 1  # Optional, defaults to 1
sequences:
  - protein:
      id: A
      sequence: MAHHHHHHVAVDAVSFTLLQDQLQSVLDTLSEREAGVVRLRFGLTDGQPRTLDEIGQVYGVTRERIRQIESKTMSKLRHPSRSQVLRDYLDGSSGSGTPEERLLRAIFGEKA
  - protein:
      id: B
      sequence: MRYAFAAEATTCNAFWRNVDMTVTALYEVPLGVCTQDPDRWTTTPDDEAKTLCRACPRRWLCARDAVESAGAEGLWAGVVIPESGRARAFALGQLRSLAERNGYPVRDHRVSAQSA
  • AWS SageMaker g5.8xlarge (A10-24G)), p3.2xlarge(V100-16G), g6.8xlarge (L4-24G) and DGX A100-80G: All good

My feeling is that the main difference is in the GPU memory. e.g. growing from 24G to 80G does solve the OOM issue of some larger complexes. But the type of GPU (e.g. A10-24G vs L4-24G) might lead to different outcomes as well.

@YogBN
Copy link

YogBN commented Dec 3, 2024

same here running into OOM issue for a multimer on an RTX4090 24GB

@xinyu-dev
Copy link
Author

I also suspect there is something with the data returned by the public Colab MMseq server. Tamarind Bio's Boltz server is seemingly running on a single L4 GPU and has no issue with any of the seqs above that I tested. I don't know what mmseq tool they use though.

@RJWANGbioinfo
Copy link

same OOM issue, I'm using version 0.3.0 which is supposed to resolve the memory issue.....
The prediction is OOM failed on L4 GPU with 24GB memory (g6.4xlarge)

It works when there are 2 input protein sequences, but failed on 3 input protein sequences....
Each of the input sequences is ~ 110-150 aa

@jwohlwend
Copy link
Owner

We just released v0.3.2 which should address some of these issues. You can update with pip install boltz -U When testing, please remove any existing output folder for your input and run again! Please let us know.

@jubosch
Copy link

jubosch commented Dec 18, 2024

I just installed (24-12-18) via pip install the current version and I'm running into the same memory issue on a RTX 2080 Titan with 11 GB and a AMD Threadripper 64 GB. Protein A 427 aa, Protein B 135 aa , Protein C 109 aa

Using only B & C I get 9.3 GB memory usage on the GPU and it works fine. Is that amount of memory used by this task normal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants