Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel problem #5

Open
Ly-Lynn opened this issue Nov 7, 2024 · 1 comment
Open

Parallel problem #5

Ly-Lynn opened this issue Nov 7, 2024 · 1 comment

Comments

@Ly-Lynn
Copy link

Ly-Lynn commented Nov 7, 2024

Hi, thank you for your work on this task!

I was trying to inference the model on a custom dataset, I saw that 1 device gave a slow speed, therefore, I was trying using 2 GPUs by setting the --parrallel_num 2. However it got this error:

/kaggle/Bayesian-Enhancement-Model
Cannot import dcn. Ignore this warning if dcn is not used. Otherwise install BasicSR with compiling dcn.
Cannot import selective_scan_cuda. This affects speed.
dataset yourdataset
2024-11-07 13:24:12,370 INFO: Network [Network] is created.
2024-11-07 13:24:13,785 INFO: Network [Network] is created.
Loaded weights from /kaggle/input/track1-traffic-vehicle-detection/BEM/CG_UNet_LOLv2Real/ckpt.pth
Loaded weights from /kaggle/input/track1-traffic-vehicle-detection/BEM/IE_UNet_LOLv2Real/ckpt.pth
  0%|                                                  | 0/4612 [00:16<?, ?it/s]
Traceback (most recent call last):
  File "/kaggle/Bayesian-Enhancement-Model/Enhancement/eval.py", line 224, in <module>
    one_preds.append(cond_net(split)[-1])
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
    output.reraise()
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/_utils.py", line 543, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/Bayesian-Enhancement-Model/basicsr/archs/UNet_arch.py", line 470, in forward
    fea = subnet(fea)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/Bayesian-Enhancement-Model/basicsr/archs/UNet_arch.py", line 350, in forward
    fea = en_block(fea)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/Bayesian-Enhancement-Model/basicsr/archs/UNet_arch.py", line 239, in forward
    x = block(x)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/Bayesian-Enhancement-Model/basicsr/vmamba/models/vmamba.py", line 1377, in forward
    return self._forward(input)
  File "/kaggle/Bayesian-Enhancement-Model/basicsr/vmamba/models/vmamba.py", line 1328, in _forwardv01
    x = x + self.drop_path(self.op(self.norm(x)))
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/kaggle/Bayesian-Enhancement-Model/basicsr/vmamba/models/vmamba.py", line 61, in forward
    x = nn.functional.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm)

this is my command

%cd /kaggle/Bayesian-Enhancement-Model
!python Enhancement/eval.py --opt /kaggle/input/track1-traffic-vehicle-detection/BEM/CG_UNet_LOLv2Real/CG_UNet_LOLv2Real.yml --weights /kaggle/input/track1-traffic-vehicle-detection/BEM/CG_UNet_LOLv2Real/ckpt.pth \
--cond_opt /kaggle/input/track1-traffic-vehicle-detection/BEM/IE_UNet_LOLv2Real/IE_UNet_LOLv2Real.yml --cond_weights /kaggle/input/track1-traffic-vehicle-detection/BEM/IE_UNet_LOLv2Real/ckpt.pth \
--no_ref niqe \
--num_samples 100 \
--parallel_num 2 \
--input_dir /kaggle/Bayesian-Enhancement-Model/dataset

Thanks a lot!

@Anonymous1563
Copy link
Owner

Thank you for your interest in our work.

Currently, our code does not support inference on multiple GPUs. The argument --parallel_num is not intended to specify the number of GPUs. However, you can set --parallel_num to 8 or higher to accelerate inference, provided your GPU has sufficient memory.

If you can tolerate some reduction in image quality, you can set --num_samples to 50 or a lower value, or include --deterministic (to enable the deterministic mode of BEM) in your script.

The implementation for inference acceleration, as mentioned in the paper, will be released soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants