Support Lumina-image-2.0 #1927

sdbds · 2025-02-12T08:31:45Z

Still in preparation.

After checking their sampler using flux and vae, the textencoder part uses google's gemma2

rockerBOO · 2025-02-13T18:31:11Z

I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing.

Thanks.

sdbds · 2025-02-15T09:12:45Z

I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing.

Thanks.

Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy.

I think I can discuss with @kohya-ss whether to continue using the previous method.

#1924 (comment)

envy-ai · 2025-02-15T16:57:15Z

Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy.

Does that mean I can download your fork and test it now?

rockerBOO · 2025-02-15T19:17:33Z

It's still not quite working but I'm working through some issues at the moment. Mostly with model loading but will see what else is needed after that. It is fairly barebones so wouldn't expect it to be in working state just yet.

… code

Lumina 2 and Gemma 2 model loading

# Conflicts: # library/lumina_models.py

Lumina cache checkpointing

sdbds · 2025-02-17T11:04:00Z

After multiple updates, the project can now run under limited conditions:

Flash_attn on Windows will cause NAN, so it must be run in a Linux environment.
Later consideration will be given to transforming it into SDP or xformers-driven
The POS ID calculation for token sequences is not padded to the max length, which leads to the necessity of batchsize = 1

Samples attention

kohya-ss · 2025-02-19T13:06:07Z

Regarding strategy, I would like you to proceed as is. I would like to refactor it together with other architectures later.

The script seems to assume that the model file is .safetensors, but I could only find .pth: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/tree/main

I would appreciate it if you could tell me where .safetensors is.

rockerBOO · 2025-02-19T17:57:28Z

I converted their consolidated.00-of-01.pth here https://huggingface.co/rockerBOO/lumina-image-2/blob/main/lumina-image-2.safetensors

sdbds · 2025-02-20T02:18:57Z

Regarding strategy, I would like you to proceed as is. I would like to refactor it together with other architectures later.

The script seems to assume that the model file is .safetensors, but I could only find .pth: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/tree/main

I would appreciate it if you could tell me where .safetensors is.

https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged

This is the version packaged by ComfyOrg

Fix samples, LoRA training. Add system prompt, use_flash_attn

rockerBOO · 2025-02-24T05:54:20Z

Probably ready for testing now, training is working and sample images are working. The timestep scheduling should be accurate but still considering if it is working as desired.

Training	Base Model

Pseudo huber loss works pretty well but not required.

Lumina 2 specific parameters

--use_flash_attn which will use flash attention 2 if you have it installed (flash_attn).
Added --samples_batch_size which makes sampling bucket and batch to better utilize your GPU. Probably could be utilized in other trainers but good to test for any edge cases.
Add --system_prompt. System prompt can be used in dataset_config, and dataset subsets. Current does not set a system prompt by default. System prompt will be appended to the front of the captions.

if args.system_type == "align":
    system_prompt = "You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts. <Prompt Start> "  
elif args.system_type == "base":
    system_prompt = "You are an assistant designed to generate high-quality images based on user prompts. <Prompt Start> " 
elif args.system_type == "aesthetics":
    system_prompt = "You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. <Prompt Start> " 
elif args.system_type == "real":
    system_prompt = "You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> "
elif args.system_type == "4grid":
    system_prompt = "You are an assistant designed to generate four high-quality images with highest degree of aesthetics arranged in 2x2 grids based on user prompts. <Prompt Start> "

Supports all the newer things like fp8, caching to disk (offloading Gemma 2 model), latents, gradient checkpointing. With all these things should be able to train within 6GB VRAM tightly (roughly 5.9GB what I was seeing).

…ntermediate steps. 2、Deprecate and remove the guidance_scale parameter because it used in inference not train 3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference.

sdbds · 2025-02-24T06:21:44Z

Update：

1、Implement cfg_trunc calculation directly using timesteps, without inference numsteps.

2、Deprecate and remove guidance_scale parameter because it used in inference not train

3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference.

envy-ai · 2025-02-24T17:08:44Z

Trying to test this... Can you post some working arguments? I'm getting an error.

venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --network_train_unet_only --max_data_loader_n_workers 4 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module lycoris.kohya --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --timestep_sampling sigmoid --model_prediction_type raw --loss_type l2 --train_batch_size 8 --network_alpha 8 --caption_extension txt --network_args "algo=loha" "preset=attn-only" "factor=8" "decompose_both=True" "full_matrix=True" --max_train_epochs 15 --network_train_unet_only --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --optimizer_args "weight_decay=0.0" --use_flash_attn

2025-02-24 17:07:15 INFO highvram is enabled / highvramが有効です train_util.py:4319
WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4336
cache_latents_to_diskが有効なため、cache_latentsを有効にします
2025-02-24 17:07:16 INFO Using DreamBooth method. train_network.py:458
INFO prepare images. train_util.py:2072
INFO get image size from name of cache files train_util.py:1961
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 292258.47it/s]
INFO set image size from cache files: 74/74 train_util.py:1991
INFO found directory /home/envy/training_data/KitsuneAnime1/1_images contains 74 image files train_util.py:2019
read caption: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 39258.60it/s]
INFO 74 train images with repeats. train_util.py:2117
INFO 0 reg images with repeats. train_util.py:2121
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2126
INFO [Dataset 0] config_util.py:581
batch_size: 8
resolution: (768, 768)
enable_bucket: False
system_prompt:

                           [Subset 0 of Dataset 0]                                                                                            
                             image_dir: "/home/envy/training_data/KitsuneAnime1/1_images"                                                     
                             image_count: 74                                                                                                  
                             num_repeats: 1                                                                                                   
                             shuffle_caption: False                                                                                           
                             keep_tokens: 0                                                                                                   
                             caption_dropout_rate: 0.0                                                                                        
                             caption_dropout_every_n_epochs: 0                                                                                
                             caption_tag_dropout_rate: 0.0                                                                                    
                             caption_prefix: None                                                                                             
                             caption_suffix: None                                                                                             
                             color_aug: False                                                                                                 
                             flip_aug: False                                                                                                  
                             face_crop_aug_range: None                                                                                        
                             random_crop: False                                                                                               
                             token_warmup_min: 1,                                                                                             
                             token_warmup_step: 0,                                                                                            
                             alpha_mask: False                                                                                                
                             custom_attributes: {}                                                                                            
                             system_prompt:                                                                                                   
                             is_reg: False                                                                                                    
                             class_tokens: images                                                                                             
                             caption_extension: .txt                                                                                          
                                                                                                                                              
                                                                                                                                              
                INFO     [Prepare dataset 0]                                                                                config_util.py:593
                INFO     loading image sizes.                                                                                train_util.py:986

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2282194.82it/s]
INFO prepare dataset train_util.py:1011
INFO preparing accelerator train_network.py:521
accelerator device: cuda
INFO Building Lumina lumina_util.py:44
INFO Loading state dict from /home/envy/models/lumina2/lumina_2_model_bf16.safetensors lumina_util.py:48
INFO Loaded Lumina: lumina_util.py:51
INFO Building Gemma2 lumina_util.py:105
INFO Loading state dict from /home/envy/models/lumina2/gemma_2_2b_fp16.safetensors lumina_util.py:145
2025-02-24 17:07:20 INFO Loaded Gemma2: _IncompatibleKeys(missing_keys=[], unexpected_keys=['spiece_model']) lumina_util.py:155
INFO Building AutoEncoder lumina_util.py:73
INFO Loading state dict from /home/envy/models/lumina2/ae.safetensors lumina_util.py:78
INFO Loaded AE: lumina_util.py:81
import network module: lycoris.kohya
2025-02-24 17:07:21 INFO [Dataset 0] train_util.py:2610
INFO caching latents with caching strategy. train_util.py:1111
INFO caching latents... train_util.py:1160
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 11259.06it/s]
INFO move vae and unet to cpu to save memory lumina_train_network.py:128
INFO move text encoders to gpu lumina_train_network.py:136
INFO [Dataset 0] train_util.py:2632
INFO caching Text Encoder outputs with caching strategy. train_util.py:1294
INFO checking cache validity... train_util.py:1305
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 14330.23it/s]
INFO no Text Encoder outputs to cache train_util.py:1332
INFO move Gemma 2 back to cpu lumina_train_network.py:187
2025-02-24 17:07:22 INFO move vae and unet back to original device lumina_train_network.py:192
2025-02-24 17:07:22|[LyCORIS]-INFO: Bypass mode is enabled
2025-02-24 17:07:22|[LyCORIS]-INFO: Full matrix mode for LoKr is enabled
2025-02-24 17:07:22|[LyCORIS]-INFO: Using rank adaptation algo: loha
2025-02-24 17:07:22|[LyCORIS]-INFO: Disable conv layer
2025-02-24 17:07:22|[LyCORIS]-INFO: Use Dropout value: 0.0
2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module
2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for Text Encoder: 0 modules.
2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module
2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for U-Net: 0 modules.
2025-02-24 17:07:22|[LyCORIS]-INFO: module type table: {}
2025-02-24 17:07:22|[LyCORIS]-INFO: enable LyCORIS for U-Net
Lumina: Gradient checkpointing enabled. CPU offload: False
prepare optimizer, data loader etc.
INFO use 8-bit AdamW optimizer | {'weight_decay': 0.0} train_util.py:4801
Traceback (most recent call last):
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in
trainer.train(args)
File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 684, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 4803, in get_optimizer
optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/adamw.py", line 114, in init
super().init(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 427, in init
super().init(params, defaults, optim_bits, is_paged)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 125, in init
super().init(params, defaults)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 366, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--network_train_unet_only', '--max_data_loader_n_workers', '4', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '768,768', '--network_module', 'lycoris.kohya', '--network_dim', '8', '--optimizer_type', 'Adamw8bit', '--learning_rate', '1e-3', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--timestep_sampling', 'sigmoid', '--model_prediction_type', 'raw', '--loss_type', 'l2', '--train_batch_size', '8', '--network_alpha', '8', '--caption_extension', 'txt', '--network_args', 'algo=loha', 'preset=attn-only', 'factor=8', 'decompose_both=True', 'full_matrix=True', '--max_train_epochs', '15', '--network_train_unet_only', '--save_every_n_epochs', '1', '--output_name', 'LuminaAnime1', '--train_data_dir', '/home/envy/training_data/KitsuneAnime1/', '--min_snr_gamma', '5.0', '--optimizer_args', 'weight_decay=0.0', '--use_flash_attn']' returned non-zero exit status 1.
(kohya) [envy@envy kohya_ss]$

sdbds · 2025-02-24T17:11:03Z

Trying to test this... Can you post some working arguments? I'm getting an error.

venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --network_train_unet_only --max_data_loader_n_workers 4 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module lycoris.kohya --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --timestep_sampling sigmoid --model_prediction_type raw --loss_type l2 --train_batch_size 8 --network_alpha 8 --caption_extension txt --network_args "algo=loha" "preset=attn-only" "factor=8" "decompose_both=True" "full_matrix=True" --max_train_epochs 15 --network_train_unet_only --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --optimizer_args "weight_decay=0.0" --use_flash_attn

2025-02-24 17:07:15 INFO highvram is enabled / highvramが有効です train_util.py:4319 WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4336 cache_latents_to_diskが有効なため、cache_latentsを有効にします 2025-02-24 17:07:16 INFO Using DreamBooth method. train_network.py:458 INFO prepare images. train_util.py:2072 INFO get image size from name of cache files train_util.py:1961 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 292258.47it/s] INFO set image size from cache files: 74/74 train_util.py:1991 INFO found directory /home/envy/training_data/KitsuneAnime1/1_images contains 74 image files train_util.py:2019 read caption: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 39258.60it/s] INFO 74 train images with repeats. train_util.py:2117 INFO 0 reg images with repeats. train_util.py:2121 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2126 INFO [Dataset 0] config_util.py:581 batch_size: 8 resolution: (768, 768) enable_bucket: False system_prompt:
                           [Subset 0 of Dataset 0]                                                                                            
                             image_dir: "/home/envy/training_data/KitsuneAnime1/1_images"                                                     
                             image_count: 74                                                                                                  
                             num_repeats: 1                                                                                                   
                             shuffle_caption: False                                                                                           
                             keep_tokens: 0                                                                                                   
                             caption_dropout_rate: 0.0                                                                                        
                             caption_dropout_every_n_epochs: 0                                                                                
                             caption_tag_dropout_rate: 0.0                                                                                    
                             caption_prefix: None                                                                                             
                             caption_suffix: None                                                                                             
                             color_aug: False                                                                                                 
                             flip_aug: False                                                                                                  
                             face_crop_aug_range: None                                                                                        
                             random_crop: False                                                                                               
                             token_warmup_min: 1,                                                                                             
                             token_warmup_step: 0,                                                                                            
                             alpha_mask: False                                                                                                
                             custom_attributes: {}                                                                                            
                             system_prompt:                                                                                                   
                             is_reg: False                                                                                                    
                             class_tokens: images                                                                                             
                             caption_extension: .txt                                                                                          
                                                                                                                                              
                                                                                                                                              
                INFO     [Prepare dataset 0]                                                                                config_util.py:593
                INFO     loading image sizes.                                                                                train_util.py:986
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2282194.82it/s] INFO prepare dataset train_util.py:1011 INFO preparing accelerator train_network.py:521 accelerator device: cuda INFO Building Lumina lumina_util.py:44 INFO Loading state dict from /home/envy/models/lumina2/lumina_2_model_bf16.safetensors lumina_util.py:48 INFO Loaded Lumina: lumina_util.py:51 INFO Building Gemma2 lumina_util.py:105 INFO Loading state dict from /home/envy/models/lumina2/gemma_2_2b_fp16.safetensors lumina_util.py:145 2025-02-24 17:07:20 INFO Loaded Gemma2: _IncompatibleKeys(missing_keys=[], unexpected_keys=['spiece_model']) lumina_util.py:155 INFO Building AutoEncoder lumina_util.py:73 INFO Loading state dict from /home/envy/models/lumina2/ae.safetensors lumina_util.py:78 INFO Loaded AE: lumina_util.py:81 import network module: lycoris.kohya 2025-02-24 17:07:21 INFO [Dataset 0] train_util.py:2610 INFO caching latents with caching strategy. train_util.py:1111 INFO caching latents... train_util.py:1160 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 11259.06it/s] INFO move vae and unet to cpu to save memory lumina_train_network.py:128 INFO move text encoders to gpu lumina_train_network.py:136 INFO [Dataset 0] train_util.py:2632 INFO caching Text Encoder outputs with caching strategy. train_util.py:1294 INFO checking cache validity... train_util.py:1305 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 14330.23it/s] INFO no Text Encoder outputs to cache train_util.py:1332 INFO move Gemma 2 back to cpu lumina_train_network.py:187 2025-02-24 17:07:22 INFO move vae and unet back to original device lumina_train_network.py:192 2025-02-24 17:07:22|[LyCORIS]-INFO: Bypass mode is enabled 2025-02-24 17:07:22|[LyCORIS]-INFO: Full matrix mode for LoKr is enabled 2025-02-24 17:07:22|[LyCORIS]-INFO: Using rank adaptation algo: loha 2025-02-24 17:07:22|[LyCORIS]-INFO: Disable conv layer 2025-02-24 17:07:22|[LyCORIS]-INFO: Use Dropout value: 0.0 2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module 2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for Text Encoder: 0 modules. 2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module 2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for U-Net: 0 modules. 2025-02-24 17:07:22|[LyCORIS]-INFO: module type table: {} 2025-02-24 17:07:22|[LyCORIS]-INFO: enable LyCORIS for U-Net Lumina: Gradient checkpointing enabled. CPU offload: False prepare optimizer, data loader etc. INFO use 8-bit AdamW optimizer | {'weight_decay': 0.0} train_util.py:4801 Traceback (most recent call last): File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in trainer.train(args) File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 684, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 4803, in get_optimizer optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/adamw.py", line 114, in init super().init( File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 427, in init super().init(params, defaults, optim_bits, is_paged) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 125, in init super().init(params, defaults) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 366, in init raise ValueError("optimizer got an empty parameter list") ValueError: optimizer got an empty parameter list Traceback (most recent call last): File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--network_train_unet_only', '--max_data_loader_n_workers', '4', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '768,768', '--network_module', 'lycoris.kohya', '--network_dim', '8', '--optimizer_type', 'Adamw8bit', '--learning_rate', '1e-3', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--timestep_sampling', 'sigmoid', '--model_prediction_type', 'raw', '--loss_type', 'l2', '--train_batch_size', '8', '--network_alpha', '8', '--caption_extension', 'txt', '--network_args', 'algo=loha', 'preset=attn-only', 'factor=8', 'decompose_both=True', 'full_matrix=True', '--max_train_epochs', '15', '--network_train_unet_only', '--save_every_n_epochs', '1', '--output_name', 'LuminaAnime1', '--train_data_dir', '/home/envy/training_data/KitsuneAnime1/', '--min_snr_gamma', '5.0', '--optimizer_args', 'weight_decay=0.0', '--use_flash_attn']' returned non-zero exit status 1. (kohya) [envy@envy kohya_ss]$

Lycoris requires additional changes, you can use lora first

envy-ai · 2025-02-24T18:00:23Z

Getting the same issue despite paring my arguments down further and switching to network.lora:

venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --max_data_loader_n_workers 4 --seed 42 --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module networks.lora --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --train_batch_size 8 --network_alpha 8 --caption_extension txt --max_train_epochs 15 --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --use_flash_attn --system_prompt "You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. "

)
2025-02-24 17:59:51 INFO highvram is enabled / highvramが有効です train_util.py:4319
WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4336
cache_latents_to_diskが有効なため、cache_latentsを有効にします
2025-02-24 17:59:52 INFO Using DreamBooth method. train_network.py:458
INFO prepare images. train_util.py:2072
INFO get image size from name of cache files train_util.py:1961
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 305791.62it/s]
INFO set image size from cache files: 74/74 train_util.py:1991
INFO found directory /home/envy/training_data/KitsuneAnime1/1_images contains 74 image files train_util.py:2019
read caption: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 41600.12it/s]
INFO 74 train images with repeats. train_util.py:2117
INFO 0 reg images with repeats. train_util.py:2121
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2126
INFO [Dataset 0] config_util.py:581
batch_size: 8
resolution: (768, 768)
enable_bucket: False
system_prompt: You are an assistant designed to generate high-quality images with highest degree
of aesthetics based on user prompts.

                           [Subset 0 of Dataset 0]                                                                                            
                             image_dir: "/home/envy/training_data/KitsuneAnime1/1_images"                                                     
                             image_count: 74                                                                                                  
                             num_repeats: 1                                                                                                   
                             shuffle_caption: False                                                                                           
                             keep_tokens: 0                                                                                                   
                             caption_dropout_rate: 0.0                                                                                        
                             caption_dropout_every_n_epochs: 0                                                                                
                             caption_tag_dropout_rate: 0.0                                                                                    
                             caption_prefix: None                                                                                             
                             caption_suffix: None                                                                                             
                             color_aug: False                                                                                                 
                             flip_aug: False                                                                                                  
                             face_crop_aug_range: None                                                                                        
                             random_crop: False                                                                                               
                             token_warmup_min: 1,                                                                                             
                             token_warmup_step: 0,                                                                                            
                             alpha_mask: False                                                                                                
                             custom_attributes: {}                                                                                            
                             system_prompt: You are an assistant designed to generate high-quality images with highest                        
                         degree of aesthetics based on user prompts. <Prompt Start>                                                           
                             is_reg: False                                                                                                    
                             class_tokens: images                                                                                             
                             caption_extension: .txt                                                                                          
                                                                                                                                              
                                                                                                                                              
                INFO     [Prepare dataset 0]                                                                                config_util.py:593
                INFO     loading image sizes.                                                                                train_util.py:986

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2265536.47it/s]
INFO prepare dataset train_util.py:1011
INFO preparing accelerator train_network.py:521
accelerator device: cuda
INFO Building Lumina lumina_util.py:44
INFO Loading state dict from /home/envy/models/lumina2/lumina_2_model_bf16.safetensors lumina_util.py:48
INFO Loaded Lumina: lumina_util.py:51
INFO Building Gemma2 lumina_util.py:105
INFO Loading state dict from /home/envy/models/lumina2/gemma_2_2b_fp16.safetensors lumina_util.py:145
2025-02-24 17:59:56 INFO Loaded Gemma2: _IncompatibleKeys(missing_keys=[], unexpected_keys=['spiece_model']) lumina_util.py:155
INFO Building AutoEncoder lumina_util.py:73
INFO Loading state dict from /home/envy/models/lumina2/ae.safetensors lumina_util.py:78
INFO Loaded AE: lumina_util.py:81
import network module: networks.lora
2025-02-24 17:59:57 INFO [Dataset 0] train_util.py:2610
INFO caching latents with caching strategy. train_util.py:1111
INFO caching latents... train_util.py:1160
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 10819.48it/s]
INFO move vae and unet to cpu to save memory lumina_train_network.py:128
INFO move text encoders to gpu lumina_train_network.py:136
INFO [Dataset 0] train_util.py:2632
INFO caching Text Encoder outputs with caching strategy. train_util.py:1294
INFO checking cache validity... train_util.py:1305
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 12957.81it/s]
INFO no Text Encoder outputs to cache train_util.py:1332
2025-02-24 17:59:58 INFO move vae and unet back to original device lumina_train_network.py:192
INFO create LoRA network. base dim (rank): 8, alpha: 8.0 lora.py:935
INFO neuron dropout: p=None, rank dropout: p=None, module dropout: p=None lora.py:936
INFO create LoRA for Text Encoder: lora.py:1030
INFO create LoRA for Text Encoder: 0 modules. lora.py:1035
INFO create LoRA for U-Net: 0 modules. lora.py:1043
INFO enable LoRA for text encoder: 0 modules lora.py:1084
INFO enable LoRA for U-Net: 0 modules lora.py:1089
prepare optimizer, data loader etc.
INFO use 8-bit AdamW optimizer | {} train_util.py:4801
Traceback (most recent call last):
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in
trainer.train(args)
File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 684, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 4803, in get_optimizer
optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/adamw.py", line 114, in init
super().init(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 427, in init
super().init(params, defaults, optim_bits, is_paged)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 125, in init
super().init(params, defaults)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 366, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '4', '--seed', '42', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '768,768', '--network_module', 'networks.lora', '--network_dim', '8', '--optimizer_type', 'Adamw8bit', '--learning_rate', '1e-3', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--train_batch_size', '8', '--network_alpha', '8', '--caption_extension', 'txt', '--max_train_epochs', '15', '--save_every_n_epochs', '1', '--output_name', 'LuminaAnime1', '--train_data_dir', '/home/envy/training_data/KitsuneAnime1/', '--min_snr_gamma', '5.0', '--use_flash_attn', '--system_prompt', 'You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. ']' returned non-zero exit status 1.

rockerBOO · 2025-02-24T18:04:43Z

Need to use --network_module networks.lora_lumina. Also for your system prompt you should end it with <Prompt Start> which is a special token. Edit Seems it is in there but hidden in the post, my bad. Maybe we could add it in the code instead of relying on the user to add it...

envy-ai · 2025-02-24T18:11:20Z

Thanks, it's working now!

sdbds · 2025-02-25T11:54:51Z

The Flash issue will be resolved soon, I found that the current Windows wheel skips the backward part...
I had to spend a whole day manually compiling one.

envy-ai · 2025-02-25T15:19:40Z

I've run into a bug that happens occasionally while it's caching tokens with batch sizes above 1. It's infrequent, only like 1% of the time.

2025-02-25 07:52:14 INFO caching Text Encoder outputs... train_util.py:1336
20%|█████████████████████▎ | 1523/7488 [04:53<19:09, 5.19it/s]
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 762, in convert_to_tensors
tensor = as_tensor(value)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 724, in as_tensor
return torch.tensor(value)
ValueError: expected sequence of length 256 at dim 1 (got 262)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in
trainer.train(args)
File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 580, in train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 147, in cache_text_encoder_outputs_if_needed
dataset.new_cache_text_encoder_outputs(text_encoders, accelerator)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 2633, in new_cache_text_encoder_outputs
dataset.new_cache_text_encoder_outputs(models, accelerator)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 1339, in new_cache_text_encoder_outputs
caching_strategy.cache_batch_outputs(tokenize_strategy, models, text_encoding_strategy, batch)
File "/home/envy/kohya_ss/sd-scripts/library/strategy_lumina.py", line 234, in cache_batch_outputs
tokens = tokenize_strategy.tokenize(captions)
File "/home/envy/kohya_ss/sd-scripts/library/strategy_lumina.py", line 52, in tokenize
encodings = self.tokenizer(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3055, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3142, in _call_one
return self.batch_encode_plus(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3338, in batch_encode_plus
return self._batch_encode_plus(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _batch_encode_plus
return BatchEncoding(sanitized_tokens, sanitized_encodings, tensor_type=return_tensors)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 227, in init
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 778, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (input_ids in this case) have excessive nesting (inputs type list where type int is expected).
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--network_train_unet_only', '--max_data_loader_n_workers', '4', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '1024', '--network_module', 'networks.lora_lumina', '--network_dim', '256', '--optimizer_type', 'AdamW', '--learning_rate', '0.0002', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--timestep_sampling', 'sigmoid', '--model_prediction_type', 'raw', '--loss_type', 'huber', '--train_batch_size', '16', '--network_alpha', '32', '--caption_extension', 'txt', '--max_train_epochs', '2', '--save_every_n_steps', '500', '--output_name', 'LuminaAesthetic', '--train_data_dir', '/home/envy/training_data', '--min_snr_gamma', '5.0', '--use_flash_attn', '--system_prompt', 'You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. ', '--huber_c', '0.1', '--huber_scale', '0.25', '--huber_schedule', 'exponential']' returned non-zero exit status 1.

(For the record, I'm running it with a batch size of 1 until it caches it all, then I'm going to cancel out and switch back to 16, so theoretically I have a work-around.)

init

d154e76

sdbds marked this pull request as draft February 12, 2025 08:32

sdbds mentioned this pull request Feb 12, 2025

Support Lumina 2.0 #1924

Open

sdbds added 2 commits February 15, 2025 16:38

update

c0caf33

update lora_lumina

7323ee1

sdbds marked this pull request as ready for review February 15, 2025 09:12

rockerBOO added 2 commits February 15, 2025 14:56

Lumina 2 and Gemma 2 model loading

a00b06b

Add caching gemma2, add gradient checkpointing, refactor lumina model…

60a76eb

… code

This was referenced Feb 16, 2025

Lumina 2 and Gemma 2 model loading sdbds/sd-scripts#12

Merged

Lumina cache checkpointing sdbds/sd-scripts#13

Merged

rockerBOO and others added 6 commits February 16, 2025 01:36

Update metadata.resolution for Lumina 2

1601563

Merge pull request #12 from rockerBOO/lumina-model-loading

6965a01

Lumina 2 and Gemma 2 model loading

update

733fdc0

Merge branch 'lumina' of https://github.com/sdbds/sd-scripts into lumina

3ce23b7

# Conflicts: # library/lumina_models.py

Merge pull request #13 from rockerBOO/lumina-cache-checkpointing

bb7bae5

Lumina cache checkpointing

update for always use gemma2 mask

aa36c48

rockerBOO and others added 7 commits February 17, 2025 12:07

Fix validation epoch divergence

44782dd

Fix sizes for validation split

3365cfa

Clear sizes for validation reg images to be consistent

3ed7606

Fix validation epoch loss to check epoch average

1aa2f00

Add documentation to model, use SDPA attention, sample images

98efbc3

Remove unused attention, fix typo

bd16bd1

Merge pull request #14 from rockerBOO/samples-attention

6597631

Samples attention

rockerBOO and others added 7 commits February 23, 2025 01:29

Fix samples, LoRA training. Add system prompt, use_flash_attn

025cca6

Remove non-used code

6d7bec8

Fix system prompt in datasets

42a8015

Set default discrete_flow_shift to 6.0. Remove default system prompt.

ba725a8

Add sample batch size for Lumina

48e7da2

Fix typo

2c94d17

Merge pull request #15 from rockerBOO/samples-training

653621d

Fix samples, LoRA training. Add system prompt, use_flash_attn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Lumina-image-2.0 #1927

Support Lumina-image-2.0 #1927

sdbds commented Feb 12, 2025 •

edited

Loading

rockerBOO commented Feb 13, 2025

sdbds commented Feb 15, 2025 •

edited

Loading

envy-ai commented Feb 15, 2025

rockerBOO commented Feb 15, 2025

sdbds commented Feb 17, 2025

kohya-ss commented Feb 19, 2025

rockerBOO commented Feb 19, 2025

sdbds commented Feb 20, 2025

rockerBOO commented Feb 24, 2025

sdbds commented Feb 24, 2025

envy-ai commented Feb 24, 2025

sdbds commented Feb 24, 2025

envy-ai commented Feb 24, 2025

rockerBOO commented Feb 24, 2025 •

edited

Loading

envy-ai commented Feb 24, 2025

sdbds commented Feb 25, 2025

envy-ai commented Feb 25, 2025 •

edited

Loading

Support Lumina-image-2.0 #1927

Are you sure you want to change the base?

Support Lumina-image-2.0 #1927

Conversation

sdbds commented Feb 12, 2025 • edited Loading

rockerBOO commented Feb 13, 2025

sdbds commented Feb 15, 2025 • edited Loading

envy-ai commented Feb 15, 2025

rockerBOO commented Feb 15, 2025

sdbds commented Feb 17, 2025

kohya-ss commented Feb 19, 2025

rockerBOO commented Feb 19, 2025

sdbds commented Feb 20, 2025

rockerBOO commented Feb 24, 2025

Lumina 2 specific parameters

sdbds commented Feb 24, 2025

envy-ai commented Feb 24, 2025

sdbds commented Feb 24, 2025

envy-ai commented Feb 24, 2025

rockerBOO commented Feb 24, 2025 • edited Loading

envy-ai commented Feb 24, 2025

sdbds commented Feb 25, 2025

envy-ai commented Feb 25, 2025 • edited Loading

sdbds commented Feb 12, 2025 •

edited

Loading

sdbds commented Feb 15, 2025 •

edited

Loading

rockerBOO commented Feb 24, 2025 •

edited

Loading

envy-ai commented Feb 25, 2025 •

edited

Loading