Very low avg loss on LoRA training / Trying to understand TensorBoard logs #237
Unanswered
Kalerindel
asked this question in
Q&A
Replies: 2 comments
-
I like your approach. Did you find some answers? |
Beta Was this translation helpful? Give feedback.
0 replies
-
I've been reading up a lot, though I've been having limited success on SDXL. I did find this TY particularly useful: https://www.youtube.com/watch?v=wJX4bBtDr9Y |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Was wondering if someone could please have a look at my settings and logs as I'm having some issues interpreting my training results
Default settings/setup were as follows:
CLI command
--num_cpu_threads_per_process 8 train_network.py --pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 --train_data_dir=/training_data/testproject_v1 --logging_dir=/output/lora/testproject_v1/log/ --output_dir=/output/lora/testproject_v1 --output_name=testproject_v1 --caption_extension=.txt --unet_lr=0.0001 --text_encoder_lr=0.00005 --max_train_epochs=1 --network_dim=128 --network_alpha=128 --resolution=512,512 --train_batch_size=4 --gradient_accumulation_steps=1 --save_every_n_epochs=1 --enable_bucket --bucket_reso_steps=64 --random_crop --optimizer_type=AdamW8bit --xformers --mixed_precision=fp16 --save_precision=fp16 --save_model_as=safetensors --clip_skip=1 --lr_scheduler=cosine_with_restarts --seed=1234 --network_module=networks.lora
First 50 steps, which turns into a sharp drop

Full graph

Run # 15 at 1500 steps

Notes from the tests:
In YT videos and misc guides I keep seeing people having just under 0.2 avg loss and it all makes sense when looking at the logs when it comes to indentifying rapid learn into churn into fry, whereas I'm struggling making sense of my logs as it seems to be such a small number between a decent result and having something completely unusable. For instance:
Was trying to take # 9 or # 12 and make the avg loss higher to see how the model looks like, but can't seem to figure out how to do so without reducing steps/repeats to stop earlier which seems seems to go against all guides/recommendations. Is there a way I can reduce the drop from the rapid learning process or am I going about this the wrong way?
Have tried reinstalling and have also noticed the same kind of results across different datasets.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions