You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to inference the model on a custom dataset, I saw that 1 device gave a slow speed, therefore, I was trying using 2 GPUs by setting the --parrallel_num 2. However it got this error:
/kaggle/Bayesian-Enhancement-Model
Cannot import dcn. Ignore this warning if dcn is not used. Otherwise install BasicSR with compiling dcn.
Cannot import selective_scan_cuda. This affects speed.
dataset yourdataset
2024-11-07 13:24:12,370 INFO: Network [Network] is created.
2024-11-07 13:24:13,785 INFO: Network [Network] is created.
Loaded weights from /kaggle/input/track1-traffic-vehicle-detection/BEM/CG_UNet_LOLv2Real/ckpt.pth
Loaded weights from /kaggle/input/track1-traffic-vehicle-detection/BEM/IE_UNet_LOLv2Real/ckpt.pth
0%| | 0/4612 [00:16<?, ?it/s]
Traceback (most recent call last):
File "/kaggle/Bayesian-Enhancement-Model/Enhancement/eval.py", line 224, in <module>
one_preds.append(cond_net(split)[-1])
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
output.reraise()
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, **kwargs)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/kaggle/Bayesian-Enhancement-Model/basicsr/archs/UNet_arch.py", line 470, in forward
fea = subnet(fea)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/kaggle/Bayesian-Enhancement-Model/basicsr/archs/UNet_arch.py", line 350, in forward
fea = en_block(fea)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/kaggle/Bayesian-Enhancement-Model/basicsr/archs/UNet_arch.py", line 239, in forward
x = block(x)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/kaggle/Bayesian-Enhancement-Model/basicsr/vmamba/models/vmamba.py", line 1377, in forward
return self._forward(input)
File "/kaggle/Bayesian-Enhancement-Model/basicsr/vmamba/models/vmamba.py", line 1328, in _forwardv01
x = x + self.drop_path(self.op(self.norm(x)))
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/kaggle/Bayesian-Enhancement-Model/basicsr/vmamba/models/vmamba.py", line 61, in forward
x = nn.functional.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm)
Currently, our code does not support inference on multiple GPUs. The argument --parallel_num is not intended to specify the number of GPUs. However, you can set --parallel_num to 8 or higher to accelerate inference, provided your GPU has sufficient memory.
If you can tolerate some reduction in image quality, you can set --num_samples to 50 or a lower value, or include --deterministic (to enable the deterministic mode of BEM) in your script.
The implementation for inference acceleration, as mentioned in the paper, will be released soon.
Hi, thank you for your work on this task!
I was trying to inference the model on a custom dataset, I saw that 1 device gave a slow speed, therefore, I was trying using 2 GPUs by setting the --parrallel_num 2. However it got this error:
this is my command
Thanks a lot!
The text was updated successfully, but these errors were encountered: