[bug]: SDXL based V-pred models are treated as epsilon prediction - result noisy images #7495
Closed
1 task done
Labels
bug
Something isn't working
Is there an existing issue for this problem?
Operating system
Windows
GPU vendor
Nvidia (CUDA)
GPU model
RTX 3080 12GB
GPU VRAM
12GB
Version number
5.5
Browser
Invoke Community Edition v5.5 launcher / MS Edge 131.0.2903.99 (Official build) (64-bit)
Python dependencies
accelerate==1.0.1
compel==2.0.2
cuda==12.4
diffusers==0.31.0
numpy==1.26.3
opencv==4.9.0.80
onnx==1.16.1
pillow==10.2.0
python==3.11.11
torch==2.4.1+cu124
torchvision==0.19.1+cu124
transformers==4.46.3
xformers==Not Installed
What happened
SDXL based v_prediction models are assigned epsilon prediction type automatically even when vpred and zsnr state_dict keys are present in the model, and manually changing prediction type to v_prediction is not respected. This results in unusably noisy outputs from these models.
What you expected to happen
v_prediction based models should have the correct prediction type detected based on the state_dict keys within the model metadata. In the event that this fails due to missing keys or any other reason, the user's manually selected prediction type under model settings should be respected resulting in normal quality outputs.
How to reproduce the problem
Additional context
I also attempted converting the model to Diffusers format both before and after manually setting the prediction type with no change in results. Additionally, the option to enable zsnr does not seem to exist in Invoke-AI though that seems to be a missing feature rather than a bug.
Discord username
bwm_nubby
The text was updated successfully, but these errors were encountered: