The training process was killed when load data batch #45

RyanPham19092002 · 2024-08-05T15:57:41Z

Hi! Thanks to your great work!
When I tried to train the PandaSet according to your guide, The model couldn't load full data batch, it was killed like the image below despite I reduced the number of scenes in PandaSet (I just used one scene) and my GPUs is 4 GPU V100 (Moreover , I want to ask that could your model training on multi-gpu ? )

How can I fix that ?
Thanks.
P/s : I run version NeuRAD tiny but it still killed like the image below

I don't know why, did the reason is my GPU is V100 - 32gb so the model can not train ?

TheScientist1900 · 2024-12-02T09:24:25Z

I have the same issue. Can anyone help me?

georghess · 2024-12-02T09:33:10Z

Hi, it's a bit hard to figure out exactly what is causing the crash. My best guess is that your system runs out of RAM. We cache the data in memory, but it shouldn't really be that heavy to run.

Are you running this in a container, or in a conda environment? Could you maybe track the memory consumption while training?

bob020416 · 2024-12-06T15:40:22Z

Hi how you solve this issue ? i have a 4060ti 16g and i cannot load the data... it is killed every time, is there a recommend ram ?

georghess · 2024-12-10T07:51:05Z

Have you tried reducing the number of workers and/or queue lenght? https://github.com/georghess/neurad-studio/blob/main/nerfstudio/data/datamanagers/image_lidar_datamanager.py#L83-L85

bob020416 · 2024-12-10T13:12:42Z

Have you tried reducing the number of workers and/or queue lenght? https://github.com/georghess/neurad-studio/blob/main/nerfstudio/data/datamanagers/image_lidar_datamanager.py#L83-L85

Hi, I discovered that it takes approximately 50gb of ram when data loading, I tried to lower the worker and it seems still cannot work, I use another computer to solve this thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training process was killed when load data batch #45

The training process was killed when load data batch #45

RyanPham19092002 commented Aug 5, 2024 •

edited

Loading

TheScientist1900 commented Dec 2, 2024

georghess commented Dec 2, 2024

bob020416 commented Dec 6, 2024

georghess commented Dec 10, 2024

bob020416 commented Dec 10, 2024

The training process was killed when load data batch #45

The training process was killed when load data batch #45

Comments

RyanPham19092002 commented Aug 5, 2024 • edited Loading

TheScientist1900 commented Dec 2, 2024

georghess commented Dec 2, 2024

bob020416 commented Dec 6, 2024

georghess commented Dec 10, 2024

bob020416 commented Dec 10, 2024

RyanPham19092002 commented Aug 5, 2024 •

edited

Loading