Doesn't converge when I train with my own data #67

zf-666 · 2024-08-13T13:50:25Z

Loss has been shaking.， Wish I could see a picture of the correct loss，

CharlesGong12 · 2024-08-16T03:40:53Z

Me too. It confuses me a lot. Have you solved it? @zf-666 Or could the authors help us please? @juxuan27

SmileTAT · 2024-08-19T13:24:48Z

How did you build your dataset?

MrWH123 · 2024-08-20T07:19:00Z

hi @juxuan27 @yuanhangio, thanks for such great work!
Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used?
I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch).

I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance?

similar issue seem in (#35), but didn't find some explanation on the loss

zf-666 · 2024-08-20T13:46:46Z

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

xduzhangjiayu · 2024-08-21T02:39:55Z

hi @juxuan27 @yuanhangio, thanks for such great work! Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used? I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch). I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance? similar issue seem in (#35), but didn't find some explanation on the loss

Hi, which resolution of images did you used for training? Only 1024x1024 or random resolution? Appreciate for the reply!

MrWH123 · 2024-08-26T06:18:11Z

hi @juxuan27 @yuanhangio, thanks for such great work! Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used? I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch). I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance? similar issue seem in (#35), but didn't find some explanation on the loss

Hi, which resolution of images did you used for training? Only 1024x1024 or random resolution? Appreciate for the reply!

1024x1024 for SDXL

MrWH123 · 2024-08-26T06:19:12Z

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

could you share your training hyper-parameters and loss figure?

shaoyandea · 2024-09-07T08:37:48Z

@yuanhangio @juxuan27 there are only about 5-10 images in own data set, can brushnet converge?? how many images at leat should I prepare??

xduzhangjiayu · 2024-09-10T02:45:05Z

I use BrushData as my dataset, but in some data missing "width" in .tar file, so the training process is failed. Is anyone know how to fix this in train_brushnet.py to skip this sample and continue training?
Thanks so much!

xduzhangjiayu · 2024-09-26T01:44:35Z

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

could you share your training hyper-parameters and loss figure?

Hi, did you solve it? Could you please share some results images? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't converge when I train with my own data #67

Doesn't converge when I train with my own data #67

zf-666 commented Aug 13, 2024

CharlesGong12 commented Aug 16, 2024 •

edited

Loading

SmileTAT commented Aug 19, 2024

MrWH123 commented Aug 20, 2024

zf-666 commented Aug 20, 2024

xduzhangjiayu commented Aug 21, 2024

MrWH123 commented Aug 26, 2024

MrWH123 commented Aug 26, 2024

shaoyandea commented Sep 7, 2024

xduzhangjiayu commented Sep 10, 2024

xduzhangjiayu commented Sep 26, 2024

Doesn't converge when I train with my own data #67

Doesn't converge when I train with my own data #67

Comments

zf-666 commented Aug 13, 2024

CharlesGong12 commented Aug 16, 2024 • edited Loading

SmileTAT commented Aug 19, 2024

MrWH123 commented Aug 20, 2024

zf-666 commented Aug 20, 2024

xduzhangjiayu commented Aug 21, 2024

MrWH123 commented Aug 26, 2024

MrWH123 commented Aug 26, 2024

shaoyandea commented Sep 7, 2024

xduzhangjiayu commented Sep 10, 2024

xduzhangjiayu commented Sep 26, 2024

CharlesGong12 commented Aug 16, 2024 •

edited

Loading