Skip to content

Feature to improve training quality via detection of out-of-tolerance latent mean/std values #2010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: sd3
Choose a base branch
from

Conversation

araleza
Copy link

@araleza araleza commented Mar 27, 2025

This is a new feature, currently only for Flux LoRA training (although it could be applied to full fine-tune later too at least). It analyses the latents of training images, and checks that their mean (average) and standard deviation values are near 0.0 and 1.0 respectively.

It was requested as a feature here: std/mean detection code.

Flux is a diffusion model that takes gaussian noise which is set to have a mean of 0.0 and an std of 1.0 as a starting image. So having training images with that same characteristic may offer improved training as the network does not have to learn how to adjust mean and std values. Current diffusion models don't seem to be good at that.

The default tolerances for detection can be set via --latent_threshold_warn_levels=mean,std_max, e.g. --latent_threshold_warn_levels=0.15,1.40. The test can be disabled using --latent_threshold_warn_levels=disable

If images do not pass the threshold test, then a warning message appears like this:

image

The std_max value sets the upper limit for the standard deviation. A lower limit is also set to 1.0 / std_max. For example, a std_max value of 1.40 also creates a lower threshold of around 0.714.

Sometimes it's not obvious why the mean and std values are not near 0 and 1. In that case, a parameter --latent_threshold_visualizer can be passed in which will show the latent average values in a window. (This has been tested on Ubuntu Linux. Please can someone try it on Windows? But it should probably work).

image
image

Various changes that can move latent means/stds towards 0,1 are possible. e.g.:

  • Adjusting image brightness/contrast
  • Adjusting image gamma curve
  • Adjusting image saturation
  • Drawing lighter areas in black backgrounds / darker areas in white backgrounds
  • Replacing backgrounds with inpainting
  • Redrawing parts of the image with img2img
  • Downsizing the image by 2x to remove noise / sharpness
  • Performing a 0.5 (in Gimp) gaussian blur on most of the image

Or even:

  • Deleting the image from the training set

One thing I've found in the small number of days I've been training with images that are closer to mean/std 0,1 is that I've had to raise the alpha value of my training, and also reduce the LR. I think that might be because the 'gravity well' of the model remaining closer to base model quality is stronger due to it not being disrupted by training images that are outside the 0,1 distribution.

Edit: I need to check that that test for args.latent_threshold_warn_levels in train_network.py doesn't break e.g. SDXL training, which won't have that option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant