Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't converge when I train with my own data #67

Open
zf-666 opened this issue Aug 13, 2024 · 10 comments
Open

Doesn't converge when I train with my own data #67

zf-666 opened this issue Aug 13, 2024 · 10 comments

Comments

@zf-666
Copy link

zf-666 commented Aug 13, 2024

Loss has been shaking., Wish I could see a picture of the correct loss,

@CharlesGong12
Copy link

CharlesGong12 commented Aug 16, 2024

Me too. It confuses me a lot. Have you solved it? @zf-666 Or could the authors help us please? @juxuan27

@SmileTAT
Copy link

How did you build your dataset?

@MrWH123
Copy link

MrWH123 commented Aug 20, 2024

hi @juxuan27 @yuanhangio, thanks for such great work!
Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used?
I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch).

I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance?

image

image

similar issue seem in (#35), but didn't find some explanation on the loss

@zf-666
Copy link
Author

zf-666 commented Aug 20, 2024

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

@xduzhangjiayu
Copy link

hi @juxuan27 @yuanhangio, thanks for such great work! Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used? I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch).  I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance?  image  image  similar issue seem in (#35), but didn't find some explanation on the loss

Hi, which resolution of images did you used for training? Only 1024x1024 or random resolution? Appreciate for the reply!

@MrWH123
Copy link

MrWH123 commented Aug 26, 2024

hi @juxuan27 @yuanhangio, thanks for such great work! Could you plz share some details on training Brushnet_sdxl? such as how many epochs, how long it takes, how many GPU used? I just use one zip package in BrashData just want to realize the training detail. Butt I find the loss fluctuation even if 11000+ steps passed (one zip has 10000 images, with batch size 4, around 4epoch).  I guess the fluctuation loss is related to the random timestep during training. but how to identify when the model is converged if loss has little guidance?  image  image  similar issue seem in (#35), but didn't find some explanation on the loss

Hi, which resolution of images did you used for training? Only 1024x1024 or random resolution? Appreciate for the reply!

1024x1024 for SDXL

@MrWH123
Copy link

MrWH123 commented Aug 26, 2024

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

could you share your training hyper-parameters and loss figure?

@shaoyandea
Copy link

@yuanhangio @juxuan27 there are only about 5-10 images in own data set, can brushnet converge?? how many images at leat should I prepare??

@xduzhangjiayu
Copy link

I use BrushData as my dataset, but in some data missing "width" in .tar file, so the training process is failed. Is anyone know how to fix this in train_brushnet.py to skip this sample and continue training?
Thanks so much!

@xduzhangjiayu
Copy link

I fix the problem by use 'fp16' and 'fp16 vae' , but another problem arises, my dataset is on the dark side, but the resulting data, while fitting the distribution, is always on the light side

could you share your training hyper-parameters and loss figure?

Hi, did you solve it? Could you please share some results images? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants