Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab L4 24GB VRAM - OOM #539

Open
2 tasks
sdes21 opened this issue Nov 22, 2024 · 2 comments
Open
2 tasks

Colab L4 24GB VRAM - OOM #539

sdes21 opened this issue Nov 22, 2024 · 2 comments
Assignees

Comments

@sdes21
Copy link

sdes21 commented Nov 22, 2024

System Info / 系統信息

Google Colab L4 - 24gb VRAM

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Updated to use the latest diffusers.

Using float16.

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="transformer", torch_dtype=torch.float16)
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="text_encoder", torch_dtype=torch.float16)
vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="vae", torch_dtype=torch.float16)

OutOfMemoryError Traceback (most recent call last)
in <cell line: 1>()
----> 1 video = pipe(
2 prompt=prompt,
3 image=image,
4 num_videos_per_prompt=1,
5 num_inference_steps=40,

17 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
718 self.groups,
719 )
--> 720 return F.conv3d(
721 input, weight, bias, self.stride, self.padding, self.dilation, self.groups
722 )

OutOfMemoryError: CUDA out of memory. Tried to allocate 1.38 GiB. GPU 0 has a total capacity of 22.17 GiB of which 842.88 MiB is free. Process 24787 has 21.34 GiB memory in use. Of the allocated memory 21.08 GiB is allocated by PyTorch, and 29.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior / 期待表现

Followed the step but always OOM.

Had the pipe.enable_sequential_cpu_offload() enabled.

@sdes21
Copy link
Author

sdes21 commented Nov 22, 2024

This is my colab code -

!pip install git+https://github.com/huggingface/diffusers
!pip install --upgrade transformers hf_transfer accelerate diffusers imageio-ffmpeg
import os
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"

import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXImageToVideoPipeline, CogVideoXTransformer3DModel
from diffusers.utils import export_to_video, load_image
from transformers import T5EncoderModel

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="transformer", torch_dtype=torch.float16)
text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="text_encoder", torch_dtype=torch.float16)
vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX1.5-5B-I2V", subfolder="vae", torch_dtype=torch.float16)

pipe = CogVideoXImageToVideoPipeline.from_pretrained(
"THUDM/CogVideoX1.5-5B-I2V",
text_encoder=text_encoder,
transformer=transformer,
vae=vae,
torch_dtype=torch.float16,
)

pipe.enable_model_cpu_offload()

prompt = "A cat sitting on a couch playing guitar. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
image = load_image("/content/a.png")

video = pipe(
prompt=prompt,
image=image,
num_videos_per_prompt=1,
num_inference_steps=40,
num_frames=81,
guidance_scale=6,
generator=torch.Generator(device="cuda").manual_seed(42),
).frames[0]

@zRzRzRzRzRzRzR
Copy link
Member

Try using our cli_demo's loading method, otherwise the peak video memory will exceed 24G.

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants