-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix samples, LoRA training. Add system prompt, use_flash_attn #15
Conversation
Thanks for your contribution, It seems that the training loss part works well. |
I will get a shift version up a little later today. I think we can use the same system as the flux but I was having issues with the reverse timesteps. Took awhile to get it even "working" so we can iterate on better approaches. |
Here they use "uniform" as default snr, and then do a shift after that. Like mentioned I will try to get the "flux_shift" like version for lumina 2. AI Toolkit has that distinction as well. |
Running a test now but we are using the "noise scheduler" https://github.com/sdbds/sd-scripts/pull/15/files#diff-ca13b8da6dab6243438086787e990ba4a5ed0661b56f9a382dc06c42bc54912eR225-R226 which has the Will post up the updated code after the test is complete. |
It looks good, I'll check again if the sampling part can be optimized. |
Thank you, I'll think about what I might ask. I am not as familiar with the timestep scheduling aspect and mostly inferring it from other codebases. For example it's a lot from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_lumina2.py#L1357-L1399. It relies on the scheduler to do the timestep shifting, but I'm only assuming it'd doing what it should. |
FlowMatchEulerDiscreteScheduler
code from upstream.--use_flash_attn
to use flash attention. If not installed will show an error. Default to SDPA attention.--system_prompt
. System prompt can be used in dataset_config, and dataset subsets. Current does not set a system prompt by default.--samples_batch_size
. Do samples in a batch, defaults to training_batch_size or 1. Might be a good or bad idea for general use but seems practical since sampling I was getting 2x less memory usage so better utilizing the GPU for samples seems like a good idea. Samples are batched based on their common latent properties like seed, width, height, CFG.In my opinion these changes make it good to go. Should be all working!
Some example system prompts from Lumina 2
Notes: