-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Lumina-image-2.0 #1927
base: sd3
Are you sure you want to change the base?
Support Lumina-image-2.0 #1927
Conversation
I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing. Thanks. |
Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy. I think I can discuss with @kohya-ss whether to continue using the previous method. |
Does that mean I can download your fork and test it now? |
It's still not quite working but I'm working through some issues at the moment. Mostly with model loading but will see what else is needed after that. It is fairly barebones so wouldn't expect it to be in working state just yet. |
Lumina 2 and Gemma 2 model loading
# Conflicts: # library/lumina_models.py
Lumina cache checkpointing
After multiple updates, the project can now run under limited conditions:
|
Samples attention
Regarding strategy, I would like you to proceed as is. I would like to refactor it together with other architectures later. The script seems to assume that the model file is .safetensors, but I could only find .pth: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/tree/main I would appreciate it if you could tell me where .safetensors is. |
https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged This is the version packaged by ComfyOrg |
Fix samples, LoRA training. Add system prompt, use_flash_attn
…ntermediate steps. 2、Deprecate and remove the guidance_scale parameter because it used in inference not train 3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference.
Update: 1、Implement cfg_trunc calculation directly using timesteps, without inference numsteps. 2、Deprecate and remove guidance_scale parameter because it used in inference not train 3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference. |
Trying to test this... Can you post some working arguments? I'm getting an error. venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --network_train_unet_only --max_data_loader_n_workers 4 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module lycoris.kohya --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --timestep_sampling sigmoid --model_prediction_type raw --loss_type l2 --train_batch_size 8 --network_alpha 8 --caption_extension txt --network_args "algo=loha" "preset=attn-only" "factor=8" "decompose_both=True" "full_matrix=True" --max_train_epochs 15 --network_train_unet_only --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --optimizer_args "weight_decay=0.0" --use_flash_attn 2025-02-24 17:07:15 INFO highvram is enabled / highvramが有効です train_util.py:4319
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2282194.82it/s] |
Lycoris requires additional changes, you can use lora first |
Getting the same issue despite paring my arguments down further and switching to network.lora: venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --max_data_loader_n_workers 4 --seed 42 --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module networks.lora --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --train_batch_size 8 --network_alpha 8 --caption_extension txt --max_train_epochs 15 --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --use_flash_attn --system_prompt "You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. " )
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2265536.47it/s] |
Need to use |
Thanks, it's working now! |
The Flash issue will be resolved soon, I found that the current Windows wheel skips the backward part... |
I've run into a bug that happens occasionally while it's caching tokens with batch sizes above 1. It's infrequent, only like 1% of the time. 2025-02-25 07:52:14 INFO caching Text Encoder outputs... train_util.py:1336 The above exception was the direct cause of the following exception: Traceback (most recent call last): (For the record, I'm running it with a batch size of 1 until it caches it all, then I'm going to cancel out and switch back to 16, so theoretically I have a work-around.) |
Still in preparation.
After checking their sampler using flux and vae, the textencoder part uses google's gemma2
@kohya-ss CC