Cuda: Out of Memory #47

zfkuang · 2025-01-13T22:26:49Z

Hi team,

Fantastic work! I'm trying the example result for video2world model on a Quadro RTX 8000 GPU, and the following error log shows up:

Traceback (most recent call last):                                                                                                                                     
  File "/workspace/cosmos1/models/diffusion/inference/video2world.py", line 178, in <module>                                                                           
    demo(args)                                                                                                                                                         
  File "/workspace/cosmos1/models/diffusion/inference/video2world.py", line 141, in demo                                                                               
    generated_output = pipeline.generate(                                                                                                                              
  File "/workspace/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 642, in generate                                                             
    video = self._run_model_with_offload(                                                                                                                              
  File "/workspace/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 566, in _run_model_with_offload                                              
    sample = self._run_model(prompt_embedding, condition_latent, negative_prompt_embedding)                                                                            
  File "/workspace/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 507, in _run_model
    video = generate_world_from_video(
  File "/workspace/cosmos1/models/diffusion/inference/inference_utils.py", line 496, in generate_world_from_video
    sample = model.generate_samples_from_batch(
  File "/workspace/cosmos1/models/diffusion/model/model_v2w.py", line 226, in generate_samples_from_batch
    samples = self.sampler(x0_fn, x_sigma_max, num_steps=num_steps, sigma_max=self.sde.sigma_max)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
   return func(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 150, in forward
    return self._forward_impl(float64_x0_fn, x_sigma_max, sampler_cfg).to(in_dtype) 
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 180, in _forward_impl
    denoised_output = differential_equation_solver(
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 280, in sample_fn
    x_at_eps, _ = fori_loop(0, num_step, step_fn, [input_xT_B_StateShape, None])
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 207, in fori_loop
    val = body_fun(i, val)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 265, in step_fn
    x0_pred_B_StateShape = x0_fn(input_x_B_StateShape, sigma_cur_0 * ones_B)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 132, in float64_x0_fn
    return x0_fn(x_B_StateShape.to(in_dtype), t_B.to(in_dtype)).to(torch.float64)
  File "/workspace/cosmos1/models/diffusion/model/model_v2w.py", line 271, in x0_fn 
    cond_x0 = self.denoise(
  File "/workspace/cosmos1/models/diffusion/model/model_v2w.py", line 147, in denoise
    denoise_pred = super().denoise(new_noise_xt, sigma, condition)
  File "/workspace/cosmos1/models/diffusion/model/model_t2w.py", line 214, in denoise
    net_output = self.net(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/networks/general_dit_video_conditioned.py", line 106, in forward
    return super().forward(
  File "/workspace/cosmos1/models/diffusion/networks/general_dit.py", line 499, in forward
  x = block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/blocks.py", line 537, in forward
    x = block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/blocks.py", line 450, in forward 
    x = x + gate_1_1_1_B_D * self.block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/blocks.py", line 322, in forward 
    x_THW_B_D = self.attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/attention.py", line 308, in forward
    return self.cal_attn(q, k, v, mask)
  File "/workspace/cosmos1/models/diffusion/module/attention.py", line 286, in cal_attn
    out = self.attn_op(q, k, v, core_attention_bias_type="no_bias", core_attention_bias=None)  # [B, Mq, H, V]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
   return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 7756, in forward
    return self.unfused_attention(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 4490, in forward
    matmul_result = torch.empty(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 189.06 GiB. GPU 0 has a total capacity of 47.27 GiB of which 24.36 GiB is free. Process 1146807 has 22.91
 GiB memory in use. Of the allocated memory 16.89 GiB is allocated by PyTorch, and 5.82 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory 
is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stabl
e/notes/cuda.html#environment-variables)

I'm running the official script from the docker environment:

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=9 PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/video2world.py \
--checkpoint_dir checkpoints \
--diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Video2World \
--input_image_or_video_path cosmos1/models/diffusion/assets/v1p0/video2world_input0.jpg \
--num_input_frames 1 \
--video_save_name Cosmos-1.0-Diffusion-7B-Video2World_memory_efficient \
--height 352 \
--width 640 \
--offload_tokenizer \
--offload_diffusion_transformer \
--offload_text_encoder_model \
--offload_prompt_upsampler \
--offload_guardrail_models

Do you have any idea how to fix it? Thanks in advance!

The text was updated successfully, but these errors were encountered:

ymcki · 2025-01-14T01:32:36Z

Does it work for you if the resolution of the output video is 960x960, 1280x704, 704x1280,960x704 or 704x960?

Officially, it only supports these five resolutions. Not sure what would happen for 640x352.

zfkuang · 2025-01-14T03:48:41Z

I initially tried the original resolution (1280x704) and this error also shows up.

ymcki · 2025-01-14T07:35:56Z

Is your input image the same resolution as your output video?

MikeAiJF · 2025-01-14T08:22:02Z

Uninstalling everything, I am unable to perform diffusion 14b inference for 48g, but I can perform autoregressive model 13b inference. May I ask if this setting can perform 14b diffusion inference? Thank you

zfkuang · 2025-01-14T08:58:53Z

I was using the default example image from the asset. I also tried the text2video model but it has the same issue

Aside from that, I also tried to print out the qkv shape in the attention module, which shows [56320, 1, 32, 128]. Should that be the correct size? Since the attention map between 56320 items seems too large.

ymcki · 2025-01-15T05:36:01Z

How about disable prompt upsampler and run Pixtral 12B externally to generate the upscaled prompt? This can save some VRAM. I need to do that to run the 7B v2w diffusion model.

zfkuang · 2025-01-15T07:43:38Z

Sure I can try that as well.

At the mean time, from the error log it seems the attention operation requires 180+GB of GPU memory - which seems too much to be able to fix with VRAM optimization. Do you have any idea on that?

hilliao · 2025-01-23T23:46:12Z

I also hit the same problem running CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$(pwd) python cosmos1/models/autoregressive/inference/base.py

System spec:

ASUS PRIME B560M-A AC
1 x DDR5 RAM
AMD Ryzen 5 8400F
1 TB PCIe Gen3 T-force NVMe m.2 disk
EVGA NVidia RTX 3050 GPU
CUDA 12.4 installed on Ubuntu Linux 24.04.

What are your system specifications? Can someone recommend what to upgrade?

root@e6ac6c334b7e:/workspace# CUDA_VISIBLE_DEVICES=0 PYTHONPATH=$(pwd) python cosmos1/models/autoregressive/inference/base.py     --input_type=video     --input_image_or_video_path=cosmos1/models/autoregressive/assets/v1p0/input.mp4     --video_save_name=Cosmos-1.0-Autoregressive-4B     --ar_model_dir=Cosmos-1.0-Autoregressive-4B     --top_p=0.8     --temperature=1.0     --offload_guardrail_models     --offload_diffusion_decoder     --offload_ar_model     --offload_tokenizer
[01-23 23:32:27|INFO|cosmos1/models/autoregressive/inference/base.py:91:main] Run with image or video path: input.mp4
[01-23 23:32:27|INFO|cosmos1/models/autoregressive/inference/world_generation_pipeline.py:414:generate] Run generation
[01-23 23:32:27|INFO|cosmos1/models/autoregressive/inference/world_generation_pipeline.py:313:_run_model_with_offload] Using input size of 9 frames
Traceback (most recent call last):
  File "/workspace/cosmos1/models/autoregressive/inference/base.py", line 116, in <module>
    main(args)
  File "/workspace/cosmos1/models/autoregressive/inference/base.py", line 92, in main
    out_vid = pipeline.generate(
  File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 415, in generate
    out_videos_cur_batch, indices_tensor_cur_batch = self._run_model_with_offload(
  File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 321, in _run_model_with_offload
    out_videos_cur_batch, indices_tensor_cur_batch = self.generate_partial_tokens_from_data_batch(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 534, in generate_partial_tokens_from_data_batch
    video_decoded, indices_tensor = self.generate_video_from_tokens(
  File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 612, in generate_video_from_tokens
    self._load_network()
  File "/workspace/cosmos1/models/autoregressive/inference/world_generation_pipeline.py", line 257, in _load_network
    self.model.load_ar_model(tokenizer_config=self.inference_config.tokenizer_config)
  File "/workspace/cosmos1/models/autoregressive/model.py", line 150, in load_ar_model
    self.model = model.to(precision).to("cuda")
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1340, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 900, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 927, in _apply
    param_applied = fn(param)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1326, in convert
    return t.to(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 7.76 GiB of which 28.31 MiB is free. Process 2914489 has 7.64 GiB memory in use. Of the allocated memory 7.45 GiB is allocated by PyTorch, and 81.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

kenji-nishimiya · 2025-01-27T03:17:05Z

I also hit same error.
I'm just trying the example for text2world model on a Quadro RTX 8000 GPU

Quadro RTX 8000 x 8
Ubuntu 22.04.5 LTS
NVIDIA-SMI 560.35.03
CUDA 12.6

root@796f82e304e9:/workspace# PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/text2world.py \
    --checkpoint_dir checkpoints \
    --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
    --prompt "$PROMPT" \
    --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
    --offload_tokenizer \
    --offload_diffusion_transformer \
    --offload_text_encoder_model \
    --offload_prompt_upsampler \
    --offload_guardrail_models
[01-27 03:02:18|INFO|cosmos1/utils/misc.py:106:set_random_seed] Using random seed 1.
[01-27 03:02:19|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:317:generate] Run with prompt: A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect.
[01-27 03:02:19|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:318:generate] Run with negative prompt: The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality.
[01-27 03:02:19|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:319:generate] Run with prompt upsampler: True
[01-27 03:02:19|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:322:generate] Run guardrail on prompt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.44it/s]
[01-27 03:02:45|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:327:generate] Pass guardrail on prompt
[01-27 03:02:45|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:333:generate] Run prompt upsampler on prompt
/usr/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.
  self.gen = func(*args, **kwds)
[01-27 03:04:42|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:175:_run_prompt_upsampler_on_prompt] Upsampled prompt: In a sprawling, meticulously organized warehouse, a sleek humanoid robot stands sentinel amidst towering shelves laden with neatly stacked cardboard boxes. The robot's metallic body, adorned with intricate joints and a glowing blue chest light, radiates an aura of advanced technology, its design a harmonious blend of functionality and futuristic elegance. The camera captures this striking figure in a static, wide shot, allowing the robot's poised stance to command attention against the backdrop of industrial wooden pallets. The bright, even lighting accentuates the robot's form, while a shallow depth of field subtly blurs the rows of boxes, enhancing the cinematic quality of the scene. This visual narrative immerses viewers in a world where cutting-edge innovation meets the practicality of modern logistics, inviting contemplation on the intersection of humanity and automation.
[01-27 03:04:42|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:336:generate] Run guardrail on upsampled prompt
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00,  1.31it/s]
[01-27 03:05:05|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:341:generate] Pass guardrail on upsampled prompt
[01-27 03:05:05|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:347:generate] Run text embedding on prompt
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
[01-27 03:05:41|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:355:generate] Finish text embedding on prompt
[01-27 03:05:41|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:358:generate] Run generation
/usr/local/lib/python3.10/dist-packages/torch/serialization.py:1243: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  warnings.warn(
Traceback (most recent call last):
  File "/workspace/cosmos1/models/diffusion/inference/text2world.py", line 160, in <module>
    demo(args)
  File "/workspace/cosmos1/models/diffusion/inference/text2world.py", line 127, in demo
    generated_output = pipeline.generate(current_prompt, cfg.negative_prompt, cfg.word_limit_to_skip_upsampler)
  File "/workspace/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 359, in generate
    video = self._run_model_with_offload(
  File "/workspace/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 277, in _run_model_with_offload
    sample = self._run_model(prompt_embedding, negative_prompt_embedding)
  File "/workspace/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 244, in _run_model
    sample = generate_world_from_text(
  File "/workspace/cosmos1/models/diffusion/inference/inference_utils.py", line 431, in generate_world_from_text
    sample = model.generate_samples_from_batch(
  File "/workspace/cosmos1/models/diffusion/model/model_t2w.py", line 277, in generate_samples_from_batch
    samples = self.sampler(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 150, in forward
    return self._forward_impl(float64_x0_fn, x_sigma_max, sampler_cfg).to(in_dtype)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 180, in _forward_impl
    denoised_output = differential_equation_solver(
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 280, in sample_fn
    x_at_eps, _ = fori_loop(0, num_step, step_fn, [input_xT_B_StateShape, None])
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 207, in fori_loop
    val = body_fun(i, val)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 265, in step_fn
    x0_pred_B_StateShape = x0_fn(input_x_B_StateShape, sigma_cur_0 * ones_B)
  File "/workspace/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 132, in float64_x0_fn
    return x0_fn(x_B_StateShape.to(in_dtype), t_B.to(in_dtype)).to(torch.float64)
  File "/workspace/cosmos1/models/diffusion/model/model_t2w.py", line 180, in x0_fn
    cond_x0 = self.denoise(noise_x, sigma, condition).x0
  File "/workspace/cosmos1/models/diffusion/model/model_t2w.py", line 214, in denoise
    net_output = self.net(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/networks/general_dit.py", line 499, in forward
    x = block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/blocks.py", line 537, in forward
    x = block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/blocks.py", line 450, in forward
    x = x + gate_1_1_1_B_D * self.block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/blocks.py", line 322, in forward
    x_THW_B_D = self.attn(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/cosmos1/models/diffusion/module/attention.py", line 305, in forward
    return self.cal_attn(q, k, v, mask)
  File "/workspace/cosmos1/models/diffusion/module/attention.py", line 283, in cal_attn
    out = self.attn_op(q, k, v, core_attention_bias_type="no_bias", core_attention_bias=None)  # [B, Mq, H, V]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 7756, in forward
    return self.unfused_attention(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py", line 4490, in forward
    matmul_result = torch.empty(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 189.06 GiB. GPU 0 has a total capacity of 47.27 GiB of which 29.71 GiB is free. Process 3830303 has 17.55 GiB memory in use. Of the allocated memory 17.08 GiB is allocated by PyTorch, and 279.90 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Can someone have any idea to fix it?
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True couldn't solve it.

hilliao · 2025-01-27T07:53:23Z

Quadro RTX 8000 x 8

are you serious? your GPU has 48 GB of memory. Can you execute watch nvidia-smi and monitor the memory usage before and after start you start the execution? Make sure you have enough GPU memory before execution.

kenji-nishimiya · 2025-01-27T12:26:35Z

Quadro RTX 8000 x 8

are you serious? your GPU has 48 GB of memory. Can you execute watch nvidia-smi and monitor the memory usage before and after start you start the execution? Make sure you have enough GPU memory before execution.

Unfortunately yes, I've checked watch nvidia-smi in the container shell, but no problem.
I believe memory is sufficient. So I want to know what is the reason.

Mon Jan 27 09:55:43 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Quadro RTX 8000                Off |   00000000:1A:00.0 Off |                  Off |
| 33%   33C    P8             32W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro RTX 8000                Off |   00000000:1B:00.0 Off |                  Off |
| 33%   34C    P8             20W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  Quadro RTX 8000                Off |   00000000:1F:00.0 Off |                  Off |
| 33%   32C    P8             22W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  Quadro RTX 8000                Off |   00000000:20:00.0 Off |                  Off |
| 33%   34C    P8             30W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  Quadro RTX 8000                Off |   00000000:21:00.0 Off |                  Off |
| 33%   33C    P8             23W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  Quadro RTX 8000                Off |   00000000:22:00.0 Off |                  Off |
| 33%   34C    P8             20W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  Quadro RTX 8000                Off |   00000000:24:00.0 Off |                  Off |
| 33%   34C    P8             22W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  Quadro RTX 8000                Off |   00000000:3B:00.0 Off |                  Off |
| 33%   32C    P8             19W /  260W |       6MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

hilliao · 2025-01-28T07:10:41Z

do you have 8 GPUs installed? Mine shows only 1 GPU at 27% util. I am running the command on the bare metal server, not in a container. What's your motherboard and CPU spec? how did you install 8 GPUs?

nvidia-smi 
Mon Jan 27 23:08:43 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.120                Driver Version: 550.120        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        Off |   00000000:01:00.0 Off |                  N/A |
| 51%   44C    P2             62W /  130W |    7358MiB /   8192MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    501676      G   ...seed-version=20250123-050148.481000         17MiB |
|    0   N/A  N/A   2703529    C+G   /home/hil/workspace/tf/bin/python3            899MiB |
|    0   N/A  N/A   2703781      C   ...rs/cuda_v12_avx/ollama_llama_server       5244MiB |
|    0   N/A  N/A   2922330    C+G   /home/hil/workspace/tf/bin/python3            893MiB |
|    0   N/A  N/A   3153630      G   /usr/lib/xorg/Xorg                            191MiB |
|    0   N/A  N/A   3154000      G   /usr/bin/gnome-shell                           13MiB |
|    0   N/A  N/A   3154863      G   /usr/bin/baobab                                15MiB |
|    0   N/A  N/A   3155021      G   /usr/bin/gnome-system-monitor                  12MiB |
|    0   N/A  N/A   3157694      G   /usr/libexec/xdg-desktop-portal-gnome           8MiB |
|    0   N/A  N/A   3158260      G   /usr/bin/nautilus                              34MiB |
+-----------------------------------------------------------------------------------------+

kenji-nishimiya · 2025-01-30T11:45:32Z

Simply, why the program "Tried to allocate 189.06 GiB"?
That's too much.

I've tried low-memory GPUs with model offloading example. But result was same.

# Example using the 7B model on low-memory GPUs with model offloading. The speed is slower if using batch generation.
PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/text2world.py \
    --checkpoint_dir checkpoints \
    --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
    --prompt "$PROMPT" \
    --video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
    --offload_tokenizer \
    --offload_diffusion_transformer \
    --offload_text_encoder_model \
    --offload_prompt_upsampler \
    --offload_guardrail_models

WANGCHAO0116 · 2025-01-30T17:44:54Z

Hi, guys~! I have the same issue as yours.

[01-30 17:19:21|INFO|cosmos1/utils/misc.py:106:set_random_seed] Using random seed 1.
/opt/conda/envs/cosmos/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be deprecated in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/serialization.py:1006: UserWarning: 'torch.load' received a zip file that looks like a TorchScript archive dispatching to 'torch.jit.load' (call 'torch.jit.load' directly to silence this warning)
  warnings.warn("'torch.load' received a zip file that looks like a TorchScript archive"
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:314:generate] Run with prompt: A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect.
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:315:generate] Run with negative prompt: The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality.
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:316:generate] Run with prompt upsampler: True
[01-30 17:25:27|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:318:generate] Run guardrail on prompt
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:25<00:00, 28.45s/it]
[01-30 17:27:14|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:323:generate] Pass guardrail on prompt
[01-30 17:27:14|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:329:generate] Run prompt upsampler on prompt
/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
[01-30 17:31:58|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:172:_run_prompt_upsampler_on_prompt] Upsampled prompt: In a sprawling, meticulously organized warehouse, a sleek humanoid robot stands sentinel amidst towering shelves brimming with neatly stacked cardboard boxes. The robot's metallic body, adorned with intricate joints and a glowing blue chest light, radiates an aura of advanced technology, its design a harmonious blend of functionality and futuristic elegance. The camera captures the robot in a static, frontal shot, emphasizing its poised stance and the subtle interplay of light that dances across its surface, highlighting the precision of its construction. Behind, the shelves stretch into the distance, their organized rows creating a striking backdrop that underscores the industrial setting. The floor, lined with wooden pallets, adds a rustic touch to the scene, while the shallow depth of field artfully blurs the background, drawing the viewer's gaze to the robot's commanding presence. This cinematic tableau invites contemplation of the intersection between human ingenuity and robotic innovation, evoking a sense of awe and wonder.
[01-30 17:31:58|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:331:generate] Run guardrail on upsampled prompt
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:31<00:00, 30.45s/it]
[01-30 17:33:52|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:336:generate] Pass guardrail on upsampled prompt
[01-30 17:33:52|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:342:generate] Run text embedding on prompt
[01-30 17:33:53|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:350:generate] Finish text embedding on prompt
[01-30 17:33:53|INFO|cosmos1/models/diffusion/inference/world_generation_pipeline.py:353:generate] Run generation
[WARNING  | DotProductAttention]: flash-attn may provide important feature support or performance improvement. Please install flash-attn >= 2.1.1, <= 2.6.3.
Traceback (most recent call last):
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/text2world.py", line 160, in <module>
    demo(args)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/text2world.py", line 127, in demo
    generated_output = pipeline.generate(current_prompt, cfg.negative_prompt, cfg.word_limit_to_skip_upsampler)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 354, in generate
    video = self._run_model_with_offload(
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 274, in _run_model_with_offload
    sample = self._run_model(prompt_embedding, negative_prompt_embedding)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/world_generation_pipeline.py", line 241, in _run_model
    sample = generate_world_from_text(
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/inference/inference_utils.py", line 433, in generate_world_from_text
    sample = model.generate_samples_from_batch(
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/model/model_t2w.py", line 277, in generate_samples_from_batch
    samples = self.sampler(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 150, in forward
    return self._forward_impl(float64_x0_fn, x_sigma_max, sampler_cfg).to(in_dtype)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 180, in _forward_impl
    denoised_output = differential_equation_solver(
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 280, in sample_fn
    x_at_eps, _ = fori_loop(0, num_step, step_fn, [input_xT_B_StateShape, None])
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 207, in fori_loop
    val = body_fun(i, val)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 265, in step_fn
    x0_pred_B_StateShape = x0_fn(input_x_B_StateShape, sigma_cur_0 * ones_B)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/diffusion/modules/res_sampler.py", line 132, in float64_x0_fn
    return x0_fn(x_B_StateShape.to(in_dtype), t_B.to(in_dtype)).to(torch.float64)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/model/model_t2w.py", line 180, in x0_fn
    cond_x0 = self.denoise(noise_x, sigma, condition).x0
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/model/model_t2w.py", line 214, in denoise
    net_output = self.net(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/networks/general_dit.py", line 499, in forward
    x = block(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/blocks.py", line 537, in forward
    x = block(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/blocks.py", line 450, in forward
    x = x + gate_1_1_1_B_D * self.block(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/blocks.py", line 322, in forward
    x_THW_B_D = self.attn(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/attention.py", line 305, in forward
    return self.cal_attn(q, k, v, mask)
  File "/workspace/open_source_prj/Cosmos-main/cosmos1/models/diffusion/module/attention.py", line 283, in cal_attn
    out = self.attn_op(q, k, v, core_attention_bias_type="no_bias", core_attention_bias=None)  # [B, Mq, H, V]
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 8306, in forward
    return self.unfused_attention(
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/cosmos/lib/python3.10/site-packages/transformer_engine/pytorch/attention.py", line 4841, in forward
    matmul_result = torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 189.06 GiB. GPU

Does anyone have an idea what's going wrong?

WANGCHAO0116 · 2025-01-30T18:27:10Z

Hi, guys~! I am coming back with my solution! :)

Just install the package flash-attn.

pip install flash-attn==2.6.3

It works for me. I hope it can solve your problems as well.

hilliao · 2025-01-30T19:11:22Z

Hi, guys~! I am coming back with my solution! :)

Just install the package flash-attn.

pip install flash-attn==2.6.3
It works for me. I hope it can solve your problems as well.

Can you be more specific about how it worked for you? Did you mean no errors of torch.cuda.OutOfMemoryError: CUDA out of memory. ?
what's your output of nvidia-smi?

sophiahhuang added the question Further information is requested label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda: Out of Memory #47

Cuda: Out of Memory #47

zfkuang commented Jan 13, 2025

ymcki commented Jan 14, 2025

zfkuang commented Jan 14, 2025

ymcki commented Jan 14, 2025

MikeAiJF commented Jan 14, 2025

zfkuang commented Jan 14, 2025 •

edited

Loading

ymcki commented Jan 15, 2025

zfkuang commented Jan 15, 2025

hilliao commented Jan 23, 2025

kenji-nishimiya commented Jan 27, 2025

hilliao commented Jan 27, 2025

kenji-nishimiya commented Jan 27, 2025 •

edited

Loading

hilliao commented Jan 28, 2025

kenji-nishimiya commented Jan 30, 2025 •

edited

Loading

WANGCHAO0116 commented Jan 30, 2025

WANGCHAO0116 commented Jan 30, 2025

hilliao commented Jan 30, 2025

Cuda: Out of Memory #47

Cuda: Out of Memory #47

Comments

zfkuang commented Jan 13, 2025

ymcki commented Jan 14, 2025

zfkuang commented Jan 14, 2025

ymcki commented Jan 14, 2025

MikeAiJF commented Jan 14, 2025

zfkuang commented Jan 14, 2025 • edited Loading

ymcki commented Jan 15, 2025

zfkuang commented Jan 15, 2025

hilliao commented Jan 23, 2025

kenji-nishimiya commented Jan 27, 2025

hilliao commented Jan 27, 2025

kenji-nishimiya commented Jan 27, 2025 • edited Loading

hilliao commented Jan 28, 2025

kenji-nishimiya commented Jan 30, 2025 • edited Loading

WANGCHAO0116 commented Jan 30, 2025

WANGCHAO0116 commented Jan 30, 2025

hilliao commented Jan 30, 2025

zfkuang commented Jan 14, 2025 •

edited

Loading

kenji-nishimiya commented Jan 27, 2025 •

edited

Loading

kenji-nishimiya commented Jan 30, 2025 •

edited

Loading