Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA out of memory. #32

Open
Xiaxia1997 opened this issue Feb 15, 2023 · 5 comments
Open

CUDA out of memory. #32

Xiaxia1997 opened this issue Feb 15, 2023 · 5 comments

Comments

@Xiaxia1997
Copy link

I am trying to run scannet/scene0059, but got cuda out of memory error. Here is the error message:

home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Vox-Fusion/src/tracking.py", line 97, in spin
    self.do_tracking(share_data, current_frame, kf_buffer)
  File "/Vox-Fusion/src/tracking.py", line 128, in do_tracking
    frame_pose, hit_mask = track_frame(
  File "/Vox-Fusion/src/variations/render_helpers.py", line 450, in track_frame
    final_outputs = render_rays(
  File "/Vox-Fusion/src/variations/render_helpers.py", line 223, in render_rays
    samples = ray_sample(intersections, step_size=step_size)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Vox-Fusion/src/variations/voxel_helpers.py", line 575, in ray_sample
    sampled_idx, sampled_depth, sampled_dists = inverse_cdf_sampling(
  File "/Vox-Fusion/src/variations/voxel_helpers.py", line 292, in forward
    noise = min_depth.new_zeros(*min_depth.size()[:-1], max_steps)
RuntimeError: CUDA out of memory. Tried to allocate 745.06 GiB (GPU 0; 23.70 GiB total capacity; 146.05 MiB already allocated; 11.93 GiB free; 176.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
^CTraceback (most recent call last):
  File "demo/run.py", line 23, in <module>
    slam.wait_child_processes()
  File "/Vox-Fusion/src/voxslam.py", line 62, in wait_child_processes
    p.join()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/popen_fork.py", line 47, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
Process Process-2:
Traceback (most recent call last):
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/Vox-Fusion/src/mapping.py", line 89, in spin
    if not kf_buffer.empty():
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/queues.py", line 123, in empty
    return not self._poll()
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 257, in poll
    return self._poll(timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
    r = wait([self], timeout)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/connection.py", line 925, in wait
    selector.register(obj, selectors.EVENT_READ)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/selectors.py", line 352, in register
    key = super().register(fileobj, events, data)
  File "/home/slam/.conda/envs/ngp_pl/lib/python3.8/selectors.py", line 235, in register
    if (not events) or (events & ~(EVENT_READ | EVENT_WRITE)):
KeyboardInterrupt
/home/slam/.conda/envs/ngp_pl/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
@JunyuanDeng
Copy link

I got the same problem! Hope for the reply.

@xingruiyang
Copy link
Collaborator

This problem might need to be diagnosed with more intermediate results. How are the predicted color and depth maps? (generated with the render_freq option)

@JunyuanDeng
Copy link

Now I dont' have the predicted color and depth maps, hope @Xiaxia1997 can provide more information.

I can provide my found: I just print *min_depth.size()[:-1], max_steps and I found max_steps is huge, like 8*1e8. I check the source code, it might be the problem of max_distance and min_distance here.

@JunyuanDeng
Copy link

JunyuanDeng commented Feb 20, 2023

I encounter this error each time there is a loop during the tracking. It seems the ray intersects with very far voxels, causing the max distance to be very big.

@jarvishou829
Copy link

I wonder how this problem can be solved. I find the max_depth in the config, maybe the voxel that exceeds the max_depth value should be ignored?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants