Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: uncorrectable ECC error encountered #40

Open
seongguheo opened this issue Dec 31, 2024 · 0 comments
Open

RuntimeError: CUDA error: uncorrectable ECC error encountered #40

seongguheo opened this issue Dec 31, 2024 · 0 comments

Comments

@seongguheo
Copy link

Dear Authors,

Thank you for the wonderful application.

I have tried to run the example like the below and encountered a RuntimeError.
Could please have a look and give me a guide to resolve the error?

Thank you,
Luke

--------------
run_RF2.sh rcsb_pdb_7ZLR.fasta --pair -o 7ZLR
(/scratch/general/vast/u6062494/tools/RoseTTAFold2/RF2) [u6062494@grn009:examples]$ ../run_RF2.sh rcsb_pdb_7ZLR.fasta --pair -o 7ZLR
Running HHblits
-> Running command: /scratch/general/vast/u6062494/tools/RoseTTAFold2/input_prep/make_protein_msa.sh 7ZLR/rcsb_pdb_7ZLR_1.fa 7ZLR rcsb_pdb_7ZLR_1 8 64
Running HHblits
-> Running command: /scratch/general/vast/u6062494/tools/RoseTTAFold2/input_prep/make_protein_msa.sh 7ZLR/rcsb_pdb_7ZLR_2.fa 7ZLR rcsb_pdb_7ZLR_2 8 64
Running HHblits
-> Running command: /scratch/general/vast/u6062494/tools/RoseTTAFold2/input_prep/make_protein_msa.sh 7ZLR/rcsb_pdb_7ZLR_3.fa 7ZLR rcsb_pdb_7ZLR_3 8 64
Creating merged MSA
-> Running command: python /scratch/general/vast/u6062494/tools/RoseTTAFold2/input_prep/make_paired_MSA_simple.py 7ZLR/rcsb_pdb_7ZLR_1.msa0.a3m 7ZLR/rcsb_pdb_7ZLR_2.msa0.a3m 7ZLR/rcsb_pdb_7ZLR_3.msa0.a3m > 7ZLR/rcsb_pdb_7ZLR_1.rcsb_pdb_7ZLR_2.rcsb_pdb_7ZLR_3.a3m
Running RoseTTAFold2 to predict structures
-> Running command: python /scratch/general/vast/u6062494/tools/RoseTTAFold2/network/predict.py -inputs 7ZLR/rcsb_pdb_7ZLR_1.rcsb_pdb_7ZLR_2.rcsb_pdb_7ZLR_3.a3m -prefix 7ZLR/models/model -model /scratch/general/vast/u6062494/tools/RoseTTAFold2/network/weights/RF2_apr23.pt -db /scratch/general/vast/u6062494/tools/RoseTTAFold2/pdb100_2021Mar03/pdb100_2021Mar03 -symm C1
Running on GPU
Traceback (most recent call last):
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/network/predict.py", line 618, in
pred = Predictor(args.model, torch.device("cuda:0"))
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/network/predict.py", line 217, in init
).to(self.device)
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/RF2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
return self._apply(convert)
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/RF2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/RF2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
module._apply(fn)
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/RF2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
param_applied = fn(param)
File "/scratch/general/vast/u6062494/tools/RoseTTAFold2/RF2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: uncorrectable ECC error encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant