-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda: Out of Memory #47
Comments
Does it work for you if the resolution of the output video is 960x960, 1280x704, 704x1280,960x704 or 704x960? Officially, it only supports these five resolutions. Not sure what would happen for 640x352. |
I initially tried the original resolution (1280x704) and this error also shows up. |
Is your input image the same resolution as your output video? |
Uninstalling everything, I am unable to perform diffusion 14b inference for 48g, but I can perform autoregressive model 13b inference. May I ask if this setting can perform 14b diffusion inference? Thank you |
I was using the default example image from the asset. I also tried the text2video model but it has the same issue Aside from that, I also tried to print out the qkv shape in the attention module, which shows [56320, 1, 32, 128]. Should that be the correct size? Since the attention map between 56320 items seems too large. |
How about disable prompt upsampler and run Pixtral 12B externally to generate the upscaled prompt? This can save some VRAM. I need to do that to run the 7B v2w diffusion model. |
Sure I can try that as well. At the mean time, from the error log it seems the attention operation requires 180+GB of GPU memory - which seems too much to be able to fix with VRAM optimization. Do you have any idea on that? |
I also hit the same problem running System spec:
What are your system specifications? Can someone recommend what to upgrade? root@e6ac6c334b7e:/workspace# CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$(pwd) python cosmos1/models/autoregressive/inference/base.py --input_type=video --input_image_or_video_path=cosmos1/models/autoregressive/assets/v1p0/input.mp4 --video_save_name=Cosmos-1.0-Autoregressive-4B --ar_model_dir=Cosmos-1.0-Autoregressive-4B --top_p=0.8 --temperature=1.0 --offload_guardrail_models --offload_diffusion_decoder --offload_ar_model --offload_tokenizer
[01-23 23:32:27|INFO|cosmos1/models/autoregressive/inference/base.py:91:main] Run with image or video path: input.mp4
[01-23 23:32:27|INFO|cosmos1/models/autoregressive/inference/world_generation_pipeline.py:414:generate] Run generation
[01-23 23:32:27|INFO|cosmos1/models/autoregressive/inference/world_generation_pipeline.py:313:_run_model_with_offload] Using input size of 9 frames
Traceback (most recent call last):
File "/workspace/cosmos1/models/autoregressive/inference/base.py", line 116, in <module>
main(args)
File "/workspace/cosmos1/models/autoregressive/inference/base.py", line 92, in main
out_vid = pipeline.generate(
File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 415, in generate
out_videos_cur_batch, indices_tensor_cur_batch = self._run_model_with_offload(
File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 321, in _run_model_with_offload
out_videos_cur_batch, indices_tensor_cur_batch = self.generate_partial_tokens_from_data_batch(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 534, in generate_partial_tokens_from_data_batch
video_decoded, indices_tensor = self.generate_video_from_tokens(
File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 612, in generate_video_from_tokens
self._load_network()
File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 257, in _load_network
self.model.load_ar_model(tokenizer_config=self.inference_config.tokenizer_config)
File "/workspace/cosmos1/models/autoregressive/model.py", line 150, in load_ar_model
self.model = model.to(precision).to("cuda")
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1340, in to
return self._apply(convert)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 900, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 927, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1326, in convert
return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 7.76 GiB of which 28.31 MiB is free. Process 2914489 has 7.64 GiB memory in use. Of the allocated memory 7.45 GiB is allocated by PyTorch, and 81.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) |
I also hit same error.
Can someone have any idea to fix it? |
are you serious? your GPU has 48 GB of memory. Can you execute |
Unfortunately yes, I've checked
|
do you have 8 GPUs installed? Mine shows only 1 GPU at 27% util. I am running the command on the bare metal server, not in a container. What's your motherboard and CPU spec? how did you install 8 GPUs? nvidia-smi
Mon Jan 27 23:08:43 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120 Driver Version: 550.120 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3050 Off | 00000000:01:00.0 Off | N/A |
| 51% 44C P2 62W / 130W | 7358MiB / 8192MiB | 27% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 501676 G ...seed-version=20250123-050148.481000 17MiB |
| 0 N/A N/A 2703529 C+G /home/hil/workspace/tf/bin/python3 899MiB |
| 0 N/A N/A 2703781 C ...rs/cuda_v12_avx/ollama_llama_server 5244MiB |
| 0 N/A N/A 2922330 C+G /home/hil/workspace/tf/bin/python3 893MiB |
| 0 N/A N/A 3153630 G /usr/lib/xorg/Xorg 191MiB |
| 0 N/A N/A 3154000 G /usr/bin/gnome-shell 13MiB |
| 0 N/A N/A 3154863 G /usr/bin/baobab 15MiB |
| 0 N/A N/A 3155021 G /usr/bin/gnome-system-monitor 12MiB |
| 0 N/A N/A 3157694 G /usr/libexec/xdg-desktop-portal-gnome 8MiB |
| 0 N/A N/A 3158260 G /usr/bin/nautilus 34MiB |
+-----------------------------------------------------------------------------------------+ |
Simply, why the program "Tried to allocate 189.06 GiB"? I've tried low-memory GPUs with model offloading example. But result was same.
|
Hi, guys~! I have the same issue as yours. [01-30 17:19:21|INFO|cosmos1/utils/misc.py:106:set_random_seed] Using random seed 1.
/opt/conda/envs/cosmos/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/serialization.py:1006: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:314:generate] Run with prompt: A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect.
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:315:generate] Run with negative prompt: The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality.
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:316:generate] Run with prompt upsampler: True
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:318:generate] Run guardrail on prompt
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:25<00:00, 28.45s/it]
[01-30 17:27:14|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:323:generate] Pass guardrail on prompt
[01-30 17:27:14|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:329:generate] Run prompt upsampler on prompt
/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
warnings.warn(
[01-30 17:31:58|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:172:_run_prompt_upsampler_on_prompt] Upsampled prompt: In a sprawling, meticulously organized warehouse, a sleek humanoid robot stands sentinel amidst towering shelves brimming with neatly stacked cardboard boxes. The robot's metallic body, adorned with intricate joints and a glowing blue chest light, radiates an aura of advanced technology, its design a harmonious blend of functionality and futuristic elegance. The camera captures the robot in a static, frontal shot, emphasizing its poised stance and the subtle interplay of light that dances across its surface, highlighting the precision of its construction. Behind, the shelves stretch into the distance, their organized rows creating a striking backdrop that underscores the industrial setting. The floor, lined with wooden pallets, adds a rustic touch to the scene, while the shallow depth of field artfully blurs the background, drawing the viewer's gaze to the robot's commanding presence. This cinematic tableau invites contemplation of the intersection between human ingenuity and robotic innovation, evoking a sense of awe and wonder.
[01-30 17:31:58|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:331:generate] Run guardrail on upsampled prompt
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:31<00:00, 30.45s/it]
[01-30 17:33:52|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:336:generate] Pass guardrail on upsampled prompt
[01-30 17:33:52|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:342:generate] Run text embedding on prompt
[01-30 17:33:53|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:350:generate] Finish text embedding on prompt
[01-30 17:33:53|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:353:generate] Run generation
[WARNING | DotProductAttention]: flash-attn may provide important feature support or performance improvement. Please install flash-attn >= 2.1.1, <= 2.6.3.
Traceback (most recent call last):
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/text2world.py", line 160, in <module>
demo(args)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/text2world.py", line 127, in demo
generated_output = pipeline.generate(current_prompt, cfg.negative_prompt, cfg.word_limit_to_skip_upsampler)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 354, in generate
video = self._run_model_with_offload(
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 274, in _run_model_with_offload
sample = self._run_model(prompt_embedding, negative_prompt_embedding)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 241, in _run_model
sample = generate_world_from_text(
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/inference_utils.py", line 433, in generate_world_from_text
sample = model.generate_samples_from_batch(
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/model/model_t2w.py", line 277, in generate_samples_from_batch
samples = self.sampler(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 150, in forward
return self._forward_impl(float64_x0_fn, x_sigma_max, sampler_cfg).to(in_dtype)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 180, in _forward_impl
denoised_output = differential_equation_solver(
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 280, in sample_fn
x_at_eps, _ = fori_loop(0, num_step, step_fn, [input_xT_B_StateShape, None])
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 207, in fori_loop
val = body_fun(i, val)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 265, in step_fn
x0_pred_B_StateShape = x0_fn(input_x_B_StateShape, sigma_cur_0 * ones_B)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 132, in float64_x0_fn
return x0_fn(x_B_StateShape.to(in_dtype), t_B.to(in_dtype)).to(torch.float64)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/model/model_t2w.py", line 180, in x0_fn
cond_x0 = self.denoise(noise_x, sigma, condition).x0
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/model/model_t2w.py", line 214, in denoise
net_output = self.net(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/networks/general_dit.py", line 499, in forward
x = block(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/blocks.py", line 537, in forward
x = block(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/blocks.py", line 450, in forward
x = x + gate_1_1_1_B_D * self.block(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/blocks.py", line 322, in forward
x_THW_B_D = self.attn(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/attention.py", line 305, in forward
return self.cal_attn(q, k, v, mask)
File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/attention.py", line 283, in cal_attn
out = self.attn_op(q, k, v, core_attention_bias_type="no_bias", core_attention_bias=None) # [B, Mq, H, V]
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 8306, in forward
return self.unfused_attention(
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 4841, in forward
matmul_result = torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 189.06 GiB. GPU Does anyone have an idea what's going wrong? |
Hi, guys~! I am coming back with my solution! :) Just install the package flash-attn. pip install flash-attn==2.6.3 It works for me. I hope it can solve your problems as well. |
|
Hi team,
Fantastic work! I'm trying the example result for video2world model on a Quadro RTX 8000 GPU, and the following error log shows up:
I'm running the official script from the docker environment:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=9 PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/video2world.py \ --checkpoint_dir checkpoints \ --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Video2World \ --input_image_or_video_path cosmos1/models/diffusion/assets/v1p0/video2world_input0.jpg \ --num_input_frames 1 \ --video_save_name Cosmos-1.0-Diffusion-7B-Video2World_memory_efficient \ --height 352 \ --width 640 \ --offload_tokenizer \ --offload_diffusion_transformer \ --offload_text_encoder_model \ --offload_prompt_upsampler \ --offload_guardrail_models
Do you have any idea how to fix it? Thanks in advance!
The text was updated successfully, but these errors were encountered: