Question about model training. #5

jh-leon-kim · 2024-01-04T11:54:08Z

I watched all your videos and followed along, it tooks about 5 days 😀, it's very fun and appreciate you!
Now I wonder how to train this model.

I also watched another video of yours “How diffusion models work - explanation and code!”.
This is also very useful and great video, thank you again!!
The video was about how to train unet(diffusion model) for latent denosing.

But we have four major models in here:
VAE-encoder, VAE-decoder, unet, and clip

If we want to train unet(diffusion mode) like "diffusion model training youtube",
does we freeze other models and train only unet?

However, the definition of learning is not well understood.
For example, if we want to create image B with a specific style of A, like A image -> styled B image

Where should I feed images A or random(input) and styled B(output), respectively?
The inference will look like this, but I don't know how to do it in training phase.
A(or random) -> VAE-encode -> [ z, clip-emb, time-emb -> unet -> z] * loop -> VAE-decode -> B

It is also questionable whether clip-embeding should just be left blank or random or specific text prompt?.
or should I input A image for clip-embeding?

I have searched on youtube for that how people train stable diffusion model then most video was using dreambooth.
It looks very hight level again like hugging face.

I would like to know exact concept and what happen under the hood.
Thanks to your video code I could understand stable diffusion ddpm model but I want to expand training concept.

Thank you for amazing works!
Happy new year!

jjeremy40 · 2024-01-26T09:40:38Z

Hi, Same question.
PS : great videos mate !

MrPeterJin · 2024-02-11T10:35:53Z

Hi, same question here too and really appreciate your video!

hkproj · 2024-02-25T01:43:36Z

Hello!

Given my current priorities, I don't think I'll be coding the training script anytime soon. Feel free to contribute it yourself... that's the spirit of open source and pull requests.

Xiao215 · 2024-03-24T01:17:09Z

I actually tried to set up a training script, but I seem run out of ram LOL. Do yall know what is an expected resources I should prepare for training this? I tried to set up the model similar to the pipeline.py (I have NO CLUE if my code works at all) but the forward looks like:

def forward(self, images, captions, tokenizer, strength=0.8):
      batch_size = len(captions)
      latents_shape = (batch_size, 4, self.LATENTS_HEIGHT, self.LATENTS_WIDTH)
      generator = torch.Generator(device=self.device)
      tokens = tokenizer(captions, padding="max_length", max_length=77, return_tensors="pt", truncation=True).input_ids.to(self.device)
      # tokens = torch.tensor(tokens, dtype=torch.long, device=self.device)
      context = self.clip(tokens)
      sampler = DDPMSampler(generator)
      sampler.set_inference_timesteps(self.n_inference_steps)
      if images is not None:
          encoder_noise = torch.randn(latents_shape, generator=generator, device=self.device)
          latents = self.encoder(images, encoder_noise)
          sampler.set_strength(strength)
          latents = sampler.add_noise(latents, sampler.timesteps[0])
      else:
          latents = torch.randn(latents_shape, generator=generator, device=self.device)

      timesteps = sampler.timesteps
      for timestep in timesteps:
          time_embedding = self.get_time_embedding(timestep).to(self.device)
          model_input = latents
          model_output = self.diffusion(model_input, context, time_embedding)
          latents = sampler.step(timestep, latents, model_output)

      return self.rescale(self.decoder(latents), (-1, 1), (0, 1), clamp=True), context

But even just feeding one image per batch [1, 3, 512, 512] to this forward, it ran out of memory.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacity of 15.72 GiB of which 287.44 MiB is free.

Heisenberg-max987 · 2024-09-06T10:36:51Z

I actually tried to set up a training script, but I seem run out of ram LOL. Do yall know what is an expected resources I should prepare for training this? I tried to set up the model similar to the pipeline.py (I have NO CLUE if my code works at all) but the forward looks like:

Hello, did u work on this training script further? I really need this for my project, I would really appreciate if you can provide the training script.

Zeyad-Abdelreheem · 2024-10-20T20:25:01Z

I think you may follow this course by Jermey Howard. It uses python built-in functions only to train stable diffusion models. You can skip any section that seems familiar.

perrino · 2024-12-05T14:00:38Z

I actually tried to set up a training script, but I seem run out of ram LOL. Do yall know what is an expected resources I should prepare for training this? I tried to set up the model similar to the pipeline.py (I have NO CLUE if my code works at all) but the forward looks like:

def forward(self, images, captions, tokenizer, strength=0.8):
      batch_size = len(captions)
      latents_shape = (batch_size, 4, self.LATENTS_HEIGHT, self.LATENTS_WIDTH)
      generator = torch.Generator(device=self.device)
      tokens = tokenizer(captions, padding="max_length", max_length=77, return_tensors="pt", truncation=True).input_ids.to(self.device)
      # tokens = torch.tensor(tokens, dtype=torch.long, device=self.device)
      context = self.clip(tokens)
      sampler = DDPMSampler(generator)
      sampler.set_inference_timesteps(self.n_inference_steps)
      if images is not None:
          encoder_noise = torch.randn(latents_shape, generator=generator, device=self.device)
          latents = self.encoder(images, encoder_noise)
          sampler.set_strength(strength)
          latents = sampler.add_noise(latents, sampler.timesteps[0])
      else:
          latents = torch.randn(latents_shape, generator=generator, device=self.device)

      timesteps = sampler.timesteps
      for timestep in timesteps:
          time_embedding = self.get_time_embedding(timestep).to(self.device)
          model_input = latents
          model_output = self.diffusion(model_input, context, time_embedding)
          latents = sampler.step(timestep, latents, model_output)

      return self.rescale(self.decoder(latents), (-1, 1), (0, 1), clamp=True), context

But even just feeding one image per batch [1, 3, 512, 512] to this forward, it ran out of memory. torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU 0 has a total capacity of 15.72 GiB of which 287.44 MiB is free.

Hi. I have the same problem actually. Please can you help us with your scripts so that we could try to use it for training?

Xiao215 · 2024-12-05T14:49:35Z

I end up give up since adjusting model architecture (reducing number of parameters means I can not use the pretrain weight). I would recommend looking more into building a LoRA fine tuning of it, since that reduces the training cost by a lot.

perrino · 2024-12-05T15:18:47Z

I end up give up since adjusting model architecture (reducing number of parameters means I can not use the pretrain weight). I would recommend looking more into building a LoRA fine tuning of it, since that reduces the training cost by a lot.

Please can you still share what you did that was leading to out of memory. I will try to train from scratch and reduce parameters myself.

hkproj mentioned this issue Feb 25, 2024

training code #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about model training. #5

Question about model training. #5

jh-leon-kim commented Jan 4, 2024 •

edited

Loading

jjeremy40 commented Jan 26, 2024

MrPeterJin commented Feb 11, 2024

hkproj commented Feb 25, 2024

Xiao215 commented Mar 24, 2024

Heisenberg-max987 commented Sep 6, 2024 •

edited

Loading

Zeyad-Abdelreheem commented Oct 20, 2024 •

edited

Loading

perrino commented Dec 5, 2024

Xiao215 commented Dec 5, 2024

perrino commented Dec 5, 2024

Question about model training. #5

Question about model training. #5

Comments

jh-leon-kim commented Jan 4, 2024 • edited Loading

jjeremy40 commented Jan 26, 2024

MrPeterJin commented Feb 11, 2024

hkproj commented Feb 25, 2024

Xiao215 commented Mar 24, 2024

Heisenberg-max987 commented Sep 6, 2024 • edited Loading

Zeyad-Abdelreheem commented Oct 20, 2024 • edited Loading

perrino commented Dec 5, 2024

Xiao215 commented Dec 5, 2024

perrino commented Dec 5, 2024

jh-leon-kim commented Jan 4, 2024 •

edited

Loading

Heisenberg-max987 commented Sep 6, 2024 •

edited

Loading

Zeyad-Abdelreheem commented Oct 20, 2024 •

edited

Loading