Fix samples, LoRA training. Add system prompt, use_flash_attn #15

rockerBOO · 2025-02-23T06:33:03Z

Fixed samples. Samples are now working as expected. Many need more functionality to support "shift" values.
Fixed training code. Lumina uses a reverse timestep (1000 = noise, 0 = clean).
Fixed batch size. Tokens now are padded to max length.
Updated FlowMatchEulerDiscreteScheduler code from upstream.
Add --use_flash_attn to use flash attention. If not installed will show an error. Default to SDPA attention.
Add --system_prompt. System prompt can be used in dataset_config, and dataset subsets. Current does not set a system prompt by default.
Add --samples_batch_size. Do samples in a batch, defaults to training_batch_size or 1. Might be a good or bad idea for general use but seems practical since sampling I was getting 2x less memory usage so better utilizing the GPU for samples seems like a good idea. Samples are batched based on their common latent properties like seed, width, height, CFG.

In my opinion these changes make it good to go. Should be all working!

Some example system prompts from Lumina 2

if args.system_type == "align":
    system_prompt = "You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts. <Prompt Start> "  
elif args.system_type == "base":
    system_prompt = "You are an assistant designed to generate high-quality images based on user prompts. <Prompt Start> " 
elif args.system_type == "aesthetics":
    system_prompt = "You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. <Prompt Start> " 
elif args.system_type == "real":
    system_prompt = "You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> "
elif args.system_type == "4grid":
    system_prompt = "You are an assistant designed to generate four high-quality images with highest degree of aesthetics arranged in 2x2 grids based on user prompts. <Prompt Start> "

Notes:

Might want to support a full range of timeshifts. During training they use "uniform" but timeshift might be better.
Image sampling could be better but not sure what is not working correctly.

sdbds · 2025-02-23T17:47:29Z

Thanks for your contribution, It seems that the training loss part works well.
Part about shift value:
I've talked to Lumina's team and they recommend it
Training uses adapter shift(flux_shift) and inference with fixed number that shift value = 6

rockerBOO · 2025-02-23T18:47:22Z

I will get a shift version up a little later today. I think we can use the same system as the flux but I was having issues with the reverse timesteps. Took awhile to get it even "working" so we can iterate on better approaches.

rockerBOO · 2025-02-23T18:50:21Z

Here they use "uniform" as default snr, and then do a shift after that. Like mentioned I will try to get the "flux_shift" like version for lumina 2. AI Toolkit has that distinction as well.

rockerBOO · 2025-02-23T19:09:17Z

Running a test now but we are using the "noise scheduler" https://github.com/sdbds/sd-scripts/pull/15/files#diff-ca13b8da6dab6243438086787e990ba4a5ed0661b56f9a382dc06c42bc54912eR225-R226 which has the discrete_flow_shift argument. This should apply the timestep shift appropriately. I am changing the default to 6.0 there.

Will post up the updated code after the test is complete.

rockerBOO · 2025-02-23T23:02:45Z

Using 6.0 shift:

Shift 6	Shift 3.0	Base Model

There could be more things to be done to make it proper but I think it works as expected with the shift. I do not know enough to be able to figure out the other details.

sdbds · 2025-02-24T03:24:50Z

It looks good, I'll check again if the sampling part can be optimized.
If you have any questions, I can also ask their team.

rockerBOO · 2025-02-24T05:58:19Z

Thank you, I'll think about what I might ask. I am not as familiar with the timestep scheduling aspect and mostly inferring it from other codebases. For example it's a lot from https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora_lumina2.py#L1357-L1399. It relies on the scheduler to do the timestep shifting, but I'm only assuming it'd doing what it should.

rockerBOO added 2 commits February 23, 2025 01:29

Fix samples, LoRA training. Add system prompt, use_flash_attn

025cca6

Remove non-used code

6d7bec8

Fix system prompt in datasets

42a8015

Set default discrete_flow_shift to 6.0. Remove default system prompt.

ba725a8

rockerBOO added 2 commits February 23, 2025 20:19

Add sample batch size for Lumina

48e7da2

Fix typo

2c94d17

sdbds merged commit 653621d into sdbds:lumina Feb 24, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix samples, LoRA training. Add system prompt, use_flash_attn #15

Fix samples, LoRA training. Add system prompt, use_flash_attn #15

rockerBOO commented Feb 23, 2025 •

edited

Loading

sdbds commented Feb 23, 2025 •

edited

Loading

rockerBOO commented Feb 23, 2025

rockerBOO commented Feb 23, 2025

rockerBOO commented Feb 23, 2025

rockerBOO commented Feb 23, 2025 •

edited

Loading

sdbds commented Feb 24, 2025 •

edited

Loading

rockerBOO commented Feb 24, 2025

Fix samples, LoRA training. Add system prompt, use_flash_attn #15

Fix samples, LoRA training. Add system prompt, use_flash_attn #15

Conversation

rockerBOO commented Feb 23, 2025 • edited Loading

sdbds commented Feb 23, 2025 • edited Loading

rockerBOO commented Feb 23, 2025

rockerBOO commented Feb 23, 2025

rockerBOO commented Feb 23, 2025

rockerBOO commented Feb 23, 2025 • edited Loading

sdbds commented Feb 24, 2025 • edited Loading

rockerBOO commented Feb 24, 2025

rockerBOO commented Feb 23, 2025 •

edited

Loading

sdbds commented Feb 23, 2025 •

edited

Loading

rockerBOO commented Feb 23, 2025 •

edited

Loading

sdbds commented Feb 24, 2025 •

edited

Loading