Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Lumina-image-2.0 #1927

Open
wants to merge 26 commits into
base: sd3
Choose a base branch
from
Open

Support Lumina-image-2.0 #1927

wants to merge 26 commits into from

Conversation

sdbds
Copy link
Contributor

@sdbds sdbds commented Feb 12, 2025

Still in preparation.

After checking their sampler using flux and vae, the textencoder part uses google's gemma2

@kohya-ss CC

@sdbds sdbds marked this pull request as draft February 12, 2025 08:32
@sdbds sdbds mentioned this pull request Feb 12, 2025
@rockerBOO
Copy link
Contributor

I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing.

Thanks.

@sdbds
Copy link
Contributor Author

sdbds commented Feb 15, 2025

I got this setup locally, I know it's not ready for anything but I want to get it working. Let me know if you want to work together on this. I can help with some of the model loading parts which is where I got stuck with after poking at it. If you are progressed past this, I can help wherever else or just testing.

Thanks.

Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy.

I think I can discuss with @kohya-ss whether to continue using the previous method.

#1924 (comment)

@sdbds sdbds marked this pull request as ready for review February 15, 2025 09:12
@envy-ai
Copy link

envy-ai commented Feb 15, 2025

Thank you, the framework is basically set up at the moment, but there is still some room for improvement in the caching strategy.

Does that mean I can download your fork and test it now?

@rockerBOO
Copy link
Contributor

It's still not quite working but I'm working through some issues at the moment. Mostly with model loading but will see what else is needed after that. It is fairly barebones so wouldn't expect it to be in working state just yet.

@sdbds
Copy link
Contributor Author

sdbds commented Feb 17, 2025

After multiple updates, the project can now run under limited conditions:

  1. Flash_attn on Windows will cause NAN, so it must be run in a Linux environment.
    Later consideration will be given to transforming it into SDP or xformers-driven
  2. The POS ID calculation for token sequences is not padded to the max length, which leads to the necessity of batchsize = 1

@kohya-ss
Copy link
Owner

Regarding strategy, I would like you to proceed as is. I would like to refactor it together with other architectures later.

The script seems to assume that the model file is .safetensors, but I could only find .pth: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/tree/main

I would appreciate it if you could tell me where .safetensors is.

@rockerBOO
Copy link
Contributor

I converted their consolidated.00-of-01.pth here https://huggingface.co/rockerBOO/lumina-image-2/blob/main/lumina-image-2.safetensors

@sdbds
Copy link
Contributor Author

sdbds commented Feb 20, 2025

Regarding strategy, I would like you to proceed as is. I would like to refactor it together with other architectures later.

The script seems to assume that the model file is .safetensors, but I could only find .pth: https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0/tree/main

I would appreciate it if you could tell me where .safetensors is.

https://huggingface.co/Comfy-Org/Lumina_Image_2.0_Repackaged

This is the version packaged by ComfyOrg

@rockerBOO
Copy link
Contributor

Probably ready for testing now, training is working and sample images are working. The timestep scheduling should be accurate but still considering if it is working as desired.

Training Base Model
ComfyUI_00073_ ComfyUI_00074_

Screenshot 2025-02-23 at 18-01-47 crimson-donkey-23 women-lumina-kohya-lora – Weights   Biases

Pseudo huber loss works pretty well but not required.

Lumina 2 specific parameters

  • --use_flash_attn which will use flash attention 2 if you have it installed (flash_attn).
  • Added --samples_batch_size which makes sampling bucket and batch to better utilize your GPU. Probably could be utilized in other trainers but good to test for any edge cases.
  • Add --system_prompt. System prompt can be used in dataset_config, and dataset subsets. Current does not set a system prompt by default. System prompt will be appended to the front of the captions.
if args.system_type == "align":
    system_prompt = "You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts. <Prompt Start> "  
elif args.system_type == "base":
    system_prompt = "You are an assistant designed to generate high-quality images based on user prompts. <Prompt Start> " 
elif args.system_type == "aesthetics":
    system_prompt = "You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. <Prompt Start> " 
elif args.system_type == "real":
    system_prompt = "You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts. <Prompt Start> "
elif args.system_type == "4grid":
    system_prompt = "You are an assistant designed to generate four high-quality images with highest degree of aesthetics arranged in 2x2 grids based on user prompts. <Prompt Start> "  

Supports all the newer things like fp8, caching to disk (offloading Gemma 2 model), latents, gradient checkpointing. With all these things should be able to train within 6GB VRAM tightly (roughly 5.9GB what I was seeing).

…ntermediate steps.

2、Deprecate and remove the guidance_scale parameter because it used in inference not train

3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference.
@sdbds
Copy link
Contributor Author

sdbds commented Feb 24, 2025

Update:

1、Implement cfg_trunc calculation directly using timesteps, without inference numsteps.

2、Deprecate and remove guidance_scale parameter because it used in inference not train

3、Add inference command-line arguments --ct for cfg_trunc_ratio and --rc for renorm_cfg to control CFG truncation and renormalization during inference.

@envy-ai
Copy link

envy-ai commented Feb 24, 2025

Trying to test this... Can you post some working arguments? I'm getting an error.

venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --network_train_unet_only --max_data_loader_n_workers 4 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module lycoris.kohya --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --timestep_sampling sigmoid --model_prediction_type raw --loss_type l2 --train_batch_size 8 --network_alpha 8 --caption_extension txt --network_args "algo=loha" "preset=attn-only" "factor=8" "decompose_both=True" "full_matrix=True" --max_train_epochs 15 --network_train_unet_only --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --optimizer_args "weight_decay=0.0" --use_flash_attn

2025-02-24 17:07:15 INFO highvram is enabled / highvramが有効です train_util.py:4319
WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4336
cache_latents_to_diskが有効なため、cache_latentsを有効にします
2025-02-24 17:07:16 INFO Using DreamBooth method. train_network.py:458
INFO prepare images. train_util.py:2072
INFO get image size from name of cache files train_util.py:1961
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 292258.47it/s]
INFO set image size from cache files: 74/74 train_util.py:1991
INFO found directory /home/envy/training_data/KitsuneAnime1/1_images contains 74 image files train_util.py:2019
read caption: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 39258.60it/s]
INFO 74 train images with repeats. train_util.py:2117
INFO 0 reg images with repeats. train_util.py:2121
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2126
INFO [Dataset 0] config_util.py:581
batch_size: 8
resolution: (768, 768)
enable_bucket: False
system_prompt:

                           [Subset 0 of Dataset 0]                                                                                            
                             image_dir: "/home/envy/training_data/KitsuneAnime1/1_images"                                                     
                             image_count: 74                                                                                                  
                             num_repeats: 1                                                                                                   
                             shuffle_caption: False                                                                                           
                             keep_tokens: 0                                                                                                   
                             caption_dropout_rate: 0.0                                                                                        
                             caption_dropout_every_n_epochs: 0                                                                                
                             caption_tag_dropout_rate: 0.0                                                                                    
                             caption_prefix: None                                                                                             
                             caption_suffix: None                                                                                             
                             color_aug: False                                                                                                 
                             flip_aug: False                                                                                                  
                             face_crop_aug_range: None                                                                                        
                             random_crop: False                                                                                               
                             token_warmup_min: 1,                                                                                             
                             token_warmup_step: 0,                                                                                            
                             alpha_mask: False                                                                                                
                             custom_attributes: {}                                                                                            
                             system_prompt:                                                                                                   
                             is_reg: False                                                                                                    
                             class_tokens: images                                                                                             
                             caption_extension: .txt                                                                                          
                                                                                                                                              
                                                                                                                                              
                INFO     [Prepare dataset 0]                                                                                config_util.py:593
                INFO     loading image sizes.                                                                                train_util.py:986

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2282194.82it/s]
INFO prepare dataset train_util.py:1011
INFO preparing accelerator train_network.py:521
accelerator device: cuda
INFO Building Lumina lumina_util.py:44
INFO Loading state dict from /home/envy/models/lumina2/lumina_2_model_bf16.safetensors lumina_util.py:48
INFO Loaded Lumina: lumina_util.py:51
INFO Building Gemma2 lumina_util.py:105
INFO Loading state dict from /home/envy/models/lumina2/gemma_2_2b_fp16.safetensors lumina_util.py:145
2025-02-24 17:07:20 INFO Loaded Gemma2: _IncompatibleKeys(missing_keys=[], unexpected_keys=['spiece_model']) lumina_util.py:155
INFO Building AutoEncoder lumina_util.py:73
INFO Loading state dict from /home/envy/models/lumina2/ae.safetensors lumina_util.py:78
INFO Loaded AE: lumina_util.py:81
import network module: lycoris.kohya
2025-02-24 17:07:21 INFO [Dataset 0] train_util.py:2610
INFO caching latents with caching strategy. train_util.py:1111
INFO caching latents... train_util.py:1160
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 11259.06it/s]
INFO move vae and unet to cpu to save memory lumina_train_network.py:128
INFO move text encoders to gpu lumina_train_network.py:136
INFO [Dataset 0] train_util.py:2632
INFO caching Text Encoder outputs with caching strategy. train_util.py:1294
INFO checking cache validity... train_util.py:1305
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 14330.23it/s]
INFO no Text Encoder outputs to cache train_util.py:1332
INFO move Gemma 2 back to cpu lumina_train_network.py:187
2025-02-24 17:07:22 INFO move vae and unet back to original device lumina_train_network.py:192
2025-02-24 17:07:22|[LyCORIS]-INFO: Bypass mode is enabled
2025-02-24 17:07:22|[LyCORIS]-INFO: Full matrix mode for LoKr is enabled
2025-02-24 17:07:22|[LyCORIS]-INFO: Using rank adaptation algo: loha
2025-02-24 17:07:22|[LyCORIS]-INFO: Disable conv layer
2025-02-24 17:07:22|[LyCORIS]-INFO: Use Dropout value: 0.0
2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module
2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for Text Encoder: 0 modules.
2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module
2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for U-Net: 0 modules.
2025-02-24 17:07:22|[LyCORIS]-INFO: module type table: {}
2025-02-24 17:07:22|[LyCORIS]-INFO: enable LyCORIS for U-Net
Lumina: Gradient checkpointing enabled. CPU offload: False
prepare optimizer, data loader etc.
INFO use 8-bit AdamW optimizer | {'weight_decay': 0.0} train_util.py:4801
Traceback (most recent call last):
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in
trainer.train(args)
File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 684, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 4803, in get_optimizer
optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/adamw.py", line 114, in init
super().init(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 427, in init
super().init(params, defaults, optim_bits, is_paged)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 125, in init
super().init(params, defaults)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 366, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--network_train_unet_only', '--max_data_loader_n_workers', '4', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '768,768', '--network_module', 'lycoris.kohya', '--network_dim', '8', '--optimizer_type', 'Adamw8bit', '--learning_rate', '1e-3', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--timestep_sampling', 'sigmoid', '--model_prediction_type', 'raw', '--loss_type', 'l2', '--train_batch_size', '8', '--network_alpha', '8', '--caption_extension', 'txt', '--network_args', 'algo=loha', 'preset=attn-only', 'factor=8', 'decompose_both=True', 'full_matrix=True', '--max_train_epochs', '15', '--network_train_unet_only', '--save_every_n_epochs', '1', '--output_name', 'LuminaAnime1', '--train_data_dir', '/home/envy/training_data/KitsuneAnime1/', '--min_snr_gamma', '5.0', '--optimizer_args', 'weight_decay=0.0', '--use_flash_attn']' returned non-zero exit status 1.
(kohya) [envy@envy kohya_ss]$

@sdbds
Copy link
Contributor Author

sdbds commented Feb 24, 2025

Trying to test this... Can you post some working arguments? I'm getting an error.

venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --network_train_unet_only --max_data_loader_n_workers 4 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module lycoris.kohya --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --timestep_sampling sigmoid --model_prediction_type raw --loss_type l2 --train_batch_size 8 --network_alpha 8 --caption_extension txt --network_args "algo=loha" "preset=attn-only" "factor=8" "decompose_both=True" "full_matrix=True" --max_train_epochs 15 --network_train_unet_only --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --optimizer_args "weight_decay=0.0" --use_flash_attn

2025-02-24 17:07:15 INFO highvram is enabled / highvramが有効です train_util.py:4319 WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4336 cache_latents_to_diskが有効なため、cache_latentsを有効にします 2025-02-24 17:07:16 INFO Using DreamBooth method. train_network.py:458 INFO prepare images. train_util.py:2072 INFO get image size from name of cache files train_util.py:1961 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 292258.47it/s] INFO set image size from cache files: 74/74 train_util.py:1991 INFO found directory /home/envy/training_data/KitsuneAnime1/1_images contains 74 image files train_util.py:2019 read caption: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 39258.60it/s] INFO 74 train images with repeats. train_util.py:2117 INFO 0 reg images with repeats. train_util.py:2121 WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2126 INFO [Dataset 0] config_util.py:581 batch_size: 8 resolution: (768, 768) enable_bucket: False system_prompt:

                           [Subset 0 of Dataset 0]                                                                                            
                             image_dir: "/home/envy/training_data/KitsuneAnime1/1_images"                                                     
                             image_count: 74                                                                                                  
                             num_repeats: 1                                                                                                   
                             shuffle_caption: False                                                                                           
                             keep_tokens: 0                                                                                                   
                             caption_dropout_rate: 0.0                                                                                        
                             caption_dropout_every_n_epochs: 0                                                                                
                             caption_tag_dropout_rate: 0.0                                                                                    
                             caption_prefix: None                                                                                             
                             caption_suffix: None                                                                                             
                             color_aug: False                                                                                                 
                             flip_aug: False                                                                                                  
                             face_crop_aug_range: None                                                                                        
                             random_crop: False                                                                                               
                             token_warmup_min: 1,                                                                                             
                             token_warmup_step: 0,                                                                                            
                             alpha_mask: False                                                                                                
                             custom_attributes: {}                                                                                            
                             system_prompt:                                                                                                   
                             is_reg: False                                                                                                    
                             class_tokens: images                                                                                             
                             caption_extension: .txt                                                                                          
                                                                                                                                              
                                                                                                                                              
                INFO     [Prepare dataset 0]                                                                                config_util.py:593
                INFO     loading image sizes.                                                                                train_util.py:986

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2282194.82it/s] INFO prepare dataset train_util.py:1011 INFO preparing accelerator train_network.py:521 accelerator device: cuda INFO Building Lumina lumina_util.py:44 INFO Loading state dict from /home/envy/models/lumina2/lumina_2_model_bf16.safetensors lumina_util.py:48 INFO Loaded Lumina: lumina_util.py:51 INFO Building Gemma2 lumina_util.py:105 INFO Loading state dict from /home/envy/models/lumina2/gemma_2_2b_fp16.safetensors lumina_util.py:145 2025-02-24 17:07:20 INFO Loaded Gemma2: _IncompatibleKeys(missing_keys=[], unexpected_keys=['spiece_model']) lumina_util.py:155 INFO Building AutoEncoder lumina_util.py:73 INFO Loading state dict from /home/envy/models/lumina2/ae.safetensors lumina_util.py:78 INFO Loaded AE: lumina_util.py:81 import network module: lycoris.kohya 2025-02-24 17:07:21 INFO [Dataset 0] train_util.py:2610 INFO caching latents with caching strategy. train_util.py:1111 INFO caching latents... train_util.py:1160 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 11259.06it/s] INFO move vae and unet to cpu to save memory lumina_train_network.py:128 INFO move text encoders to gpu lumina_train_network.py:136 INFO [Dataset 0] train_util.py:2632 INFO caching Text Encoder outputs with caching strategy. train_util.py:1294 INFO checking cache validity... train_util.py:1305 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 14330.23it/s] INFO no Text Encoder outputs to cache train_util.py:1332 INFO move Gemma 2 back to cpu lumina_train_network.py:187 2025-02-24 17:07:22 INFO move vae and unet back to original device lumina_train_network.py:192 2025-02-24 17:07:22|[LyCORIS]-INFO: Bypass mode is enabled 2025-02-24 17:07:22|[LyCORIS]-INFO: Full matrix mode for LoKr is enabled 2025-02-24 17:07:22|[LyCORIS]-INFO: Using rank adaptation algo: loha 2025-02-24 17:07:22|[LyCORIS]-INFO: Disable conv layer 2025-02-24 17:07:22|[LyCORIS]-INFO: Use Dropout value: 0.0 2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module 2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for Text Encoder: 0 modules. 2025-02-24 17:07:22|[LyCORIS]-INFO: Create LyCORIS Module 2025-02-24 17:07:22|[LyCORIS]-INFO: create LyCORIS for U-Net: 0 modules. 2025-02-24 17:07:22|[LyCORIS]-INFO: module type table: {} 2025-02-24 17:07:22|[LyCORIS]-INFO: enable LyCORIS for U-Net Lumina: Gradient checkpointing enabled. CPU offload: False prepare optimizer, data loader etc. INFO use 8-bit AdamW optimizer | {'weight_decay': 0.0} train_util.py:4801 Traceback (most recent call last): File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in trainer.train(args) File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 684, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 4803, in get_optimizer optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/adamw.py", line 114, in init super().init( File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 427, in init super().init(params, defaults, optim_bits, is_paged) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 125, in init super().init(params, defaults) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 366, in init raise ValueError("optimizer got an empty parameter list") ValueError: optimizer got an empty parameter list Traceback (most recent call last): File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command simple_launcher(args) File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--network_train_unet_only', '--max_data_loader_n_workers', '4', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '768,768', '--network_module', 'lycoris.kohya', '--network_dim', '8', '--optimizer_type', 'Adamw8bit', '--learning_rate', '1e-3', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--timestep_sampling', 'sigmoid', '--model_prediction_type', 'raw', '--loss_type', 'l2', '--train_batch_size', '8', '--network_alpha', '8', '--caption_extension', 'txt', '--network_args', 'algo=loha', 'preset=attn-only', 'factor=8', 'decompose_both=True', 'full_matrix=True', '--max_train_epochs', '15', '--network_train_unet_only', '--save_every_n_epochs', '1', '--output_name', 'LuminaAnime1', '--train_data_dir', '/home/envy/training_data/KitsuneAnime1/', '--min_snr_gamma', '5.0', '--optimizer_args', 'weight_decay=0.0', '--use_flash_attn']' returned non-zero exit status 1. (kohya) [envy@envy kohya_ss]$

Lycoris requires additional changes, you can use lora first

@envy-ai
Copy link

envy-ai commented Feb 24, 2025

Getting the same issue despite paring my arguments down further and switching to network.lora:

venv/bin/accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 sd-scripts/lumina_train_network.py --pretrained_model_name_or_path "/home/envy/models/lumina2/lumina_2_model_bf16.safetensors" --ae "/home/envy/models/lumina2/ae.safetensors" --gemma2 "/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors" --cache_latents_to_disk --save_model_as safetensors --xformers --persistent_data_loader_workers --max_data_loader_n_workers 4 --seed 42 --mixed_precision bf16 --save_precision bf16 --resolution 768,768 --network_module networks.lora --network_dim 8 --optimizer_type Adamw8bit --learning_rate 1e-3 --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --sample_sampler "euler" --output_dir /home/envy/models/lora --train_batch_size 8 --network_alpha 8 --caption_extension txt --max_train_epochs 15 --save_every_n_epochs 1 --output_name LuminaAnime1 --train_data_dir /home/envy/training_data/KitsuneAnime1/ --min_snr_gamma 5.0 --use_flash_attn --system_prompt "You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. "

)
2025-02-24 17:59:51 INFO highvram is enabled / highvramが有効です train_util.py:4319
WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4336
cache_latents_to_diskが有効なため、cache_latentsを有効にします
2025-02-24 17:59:52 INFO Using DreamBooth method. train_network.py:458
INFO prepare images. train_util.py:2072
INFO get image size from name of cache files train_util.py:1961
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 305791.62it/s]
INFO set image size from cache files: 74/74 train_util.py:1991
INFO found directory /home/envy/training_data/KitsuneAnime1/1_images contains 74 image files train_util.py:2019
read caption: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 41600.12it/s]
INFO 74 train images with repeats. train_util.py:2117
INFO 0 reg images with repeats. train_util.py:2121
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:2126
INFO [Dataset 0] config_util.py:581
batch_size: 8
resolution: (768, 768)
enable_bucket: False
system_prompt: You are an assistant designed to generate high-quality images with highest degree
of aesthetics based on user prompts.

                           [Subset 0 of Dataset 0]                                                                                            
                             image_dir: "/home/envy/training_data/KitsuneAnime1/1_images"                                                     
                             image_count: 74                                                                                                  
                             num_repeats: 1                                                                                                   
                             shuffle_caption: False                                                                                           
                             keep_tokens: 0                                                                                                   
                             caption_dropout_rate: 0.0                                                                                        
                             caption_dropout_every_n_epochs: 0                                                                                
                             caption_tag_dropout_rate: 0.0                                                                                    
                             caption_prefix: None                                                                                             
                             caption_suffix: None                                                                                             
                             color_aug: False                                                                                                 
                             flip_aug: False                                                                                                  
                             face_crop_aug_range: None                                                                                        
                             random_crop: False                                                                                               
                             token_warmup_min: 1,                                                                                             
                             token_warmup_step: 0,                                                                                            
                             alpha_mask: False                                                                                                
                             custom_attributes: {}                                                                                            
                             system_prompt: You are an assistant designed to generate high-quality images with highest                        
                         degree of aesthetics based on user prompts. <Prompt Start>                                                           
                             is_reg: False                                                                                                    
                             class_tokens: images                                                                                             
                             caption_extension: .txt                                                                                          
                                                                                                                                              
                                                                                                                                              
                INFO     [Prepare dataset 0]                                                                                config_util.py:593
                INFO     loading image sizes.                                                                                train_util.py:986

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 2265536.47it/s]
INFO prepare dataset train_util.py:1011
INFO preparing accelerator train_network.py:521
accelerator device: cuda
INFO Building Lumina lumina_util.py:44
INFO Loading state dict from /home/envy/models/lumina2/lumina_2_model_bf16.safetensors lumina_util.py:48
INFO Loaded Lumina: lumina_util.py:51
INFO Building Gemma2 lumina_util.py:105
INFO Loading state dict from /home/envy/models/lumina2/gemma_2_2b_fp16.safetensors lumina_util.py:145
2025-02-24 17:59:56 INFO Loaded Gemma2: _IncompatibleKeys(missing_keys=[], unexpected_keys=['spiece_model']) lumina_util.py:155
INFO Building AutoEncoder lumina_util.py:73
INFO Loading state dict from /home/envy/models/lumina2/ae.safetensors lumina_util.py:78
INFO Loaded AE: lumina_util.py:81
import network module: networks.lora
2025-02-24 17:59:57 INFO [Dataset 0] train_util.py:2610
INFO caching latents with caching strategy. train_util.py:1111
INFO caching latents... train_util.py:1160
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 10819.48it/s]
INFO move vae and unet to cpu to save memory lumina_train_network.py:128
INFO move text encoders to gpu lumina_train_network.py:136
INFO [Dataset 0] train_util.py:2632
INFO caching Text Encoder outputs with caching strategy. train_util.py:1294
INFO checking cache validity... train_util.py:1305
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 74/74 [00:00<00:00, 12957.81it/s]
INFO no Text Encoder outputs to cache train_util.py:1332
2025-02-24 17:59:58 INFO move vae and unet back to original device lumina_train_network.py:192
INFO create LoRA network. base dim (rank): 8, alpha: 8.0 lora.py:935
INFO neuron dropout: p=None, rank dropout: p=None, module dropout: p=None lora.py:936
INFO create LoRA for Text Encoder: lora.py:1030
INFO create LoRA for Text Encoder: 0 modules. lora.py:1035
INFO create LoRA for U-Net: 0 modules. lora.py:1043
INFO enable LoRA for text encoder: 0 modules lora.py:1084
INFO enable LoRA for U-Net: 0 modules lora.py:1089
prepare optimizer, data loader etc.
INFO use 8-bit AdamW optimizer | {} train_util.py:4801
Traceback (most recent call last):
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in
trainer.train(args)
File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 684, in train
optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 4803, in get_optimizer
optimizer = optimizer_class(trainable_params, lr=lr, **optimizer_kwargs)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/adamw.py", line 114, in init
super().init(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 427, in init
super().init(params, defaults, optim_bits, is_paged)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 125, in init
super().init(params, defaults)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 366, in init
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '4', '--seed', '42', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '768,768', '--network_module', 'networks.lora', '--network_dim', '8', '--optimizer_type', 'Adamw8bit', '--learning_rate', '1e-3', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--train_batch_size', '8', '--network_alpha', '8', '--caption_extension', 'txt', '--max_train_epochs', '15', '--save_every_n_epochs', '1', '--output_name', 'LuminaAnime1', '--train_data_dir', '/home/envy/training_data/KitsuneAnime1/', '--min_snr_gamma', '5.0', '--use_flash_attn', '--system_prompt', 'You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. ']' returned non-zero exit status 1.

@rockerBOO
Copy link
Contributor

rockerBOO commented Feb 24, 2025

Need to use --network_module networks.lora_lumina. Also for your system prompt you should end it with <Prompt Start> which is a special token. Edit Seems it is in there but hidden in the post, my bad. Maybe we could add it in the code instead of relying on the user to add it...

@envy-ai
Copy link

envy-ai commented Feb 24, 2025

Thanks, it's working now!

@sdbds
Copy link
Contributor Author

sdbds commented Feb 25, 2025

The Flash issue will be resolved soon, I found that the current Windows wheel skips the backward part...
I had to spend a whole day manually compiling one.

@envy-ai
Copy link

envy-ai commented Feb 25, 2025

I've run into a bug that happens occasionally while it's caching tokens with batch sizes above 1. It's infrequent, only like 1% of the time.

2025-02-25 07:52:14 INFO caching Text Encoder outputs... train_util.py:1336
20%|█████████████████████▎ | 1523/7488 [04:53<19:09, 5.19it/s]
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 762, in convert_to_tensors
tensor = as_tensor(value)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 724, in as_tensor
return torch.tensor(value)
ValueError: expected sequence of length 256 at dim 1 (got 262)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 406, in
trainer.train(args)
File "/home/envy/kohya_ss/sd-scripts/train_network.py", line 580, in train
self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
File "/home/envy/kohya_ss/sd-scripts/lumina_train_network.py", line 147, in cache_text_encoder_outputs_if_needed
dataset.new_cache_text_encoder_outputs(text_encoders, accelerator)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 2633, in new_cache_text_encoder_outputs
dataset.new_cache_text_encoder_outputs(models, accelerator)
File "/home/envy/kohya_ss/sd-scripts/library/train_util.py", line 1339, in new_cache_text_encoder_outputs
caching_strategy.cache_batch_outputs(tokenize_strategy, models, text_encoding_strategy, batch)
File "/home/envy/kohya_ss/sd-scripts/library/strategy_lumina.py", line 234, in cache_batch_outputs
tokens = tokenize_strategy.tokenize(captions)
File "/home/envy/kohya_ss/sd-scripts/library/strategy_lumina.py", line 52, in tokenize
encodings = self.tokenizer(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3055, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3142, in _call_one
return self.batch_encode_plus(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3338, in batch_encode_plus
return self._batch_encode_plus(
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 576, in _batch_encode_plus
return BatchEncoding(sanitized_tokens, sanitized_encodings, tensor_type=return_tensors)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 227, in init
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 778, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (input_ids in this case) have excessive nesting (inputs type list where type int is expected).
Traceback (most recent call last):
File "/home/envy/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
simple_launcher(args)
File "/home/envy/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/envy/kohya_ss/venv/bin/python3.10', 'sd-scripts/lumina_train_network.py', '--pretrained_model_name_or_path', '/home/envy/models/lumina2/lumina_2_model_bf16.safetensors', '--ae', '/home/envy/models/lumina2/ae.safetensors', '--gemma2', '/home/envy/models/lumina2/gemma_2_2b_fp16.safetensors', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--xformers', '--persistent_data_loader_workers', '--network_train_unet_only', '--max_data_loader_n_workers', '4', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--resolution', '1024', '--network_module', 'networks.lora_lumina', '--network_dim', '256', '--optimizer_type', 'AdamW', '--learning_rate', '0.0002', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--highvram', '--sample_sampler', 'euler', '--output_dir', '/home/envy/models/lora', '--timestep_sampling', 'sigmoid', '--model_prediction_type', 'raw', '--loss_type', 'huber', '--train_batch_size', '16', '--network_alpha', '32', '--caption_extension', 'txt', '--max_train_epochs', '2', '--save_every_n_steps', '500', '--output_name', 'LuminaAesthetic', '--train_data_dir', '/home/envy/training_data', '--min_snr_gamma', '5.0', '--use_flash_attn', '--system_prompt', 'You are an assistant designed to generate high-quality images with highest degree of aesthetics based on user prompts. ', '--huber_c', '0.1', '--huber_scale', '0.25', '--huber_schedule', 'exponential']' returned non-zero exit status 1.

(For the record, I'm running it with a batch size of 1 until it caches it all, then I'm going to cancel out and switch back to 16, so theoretically I have a work-around.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants