Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Scaling factor #33

Open
sunly92 opened this issue Nov 23, 2024 · 1 comment
Open

Missing Scaling factor #33

sunly92 opened this issue Nov 23, 2024 · 1 comment

Comments

@sunly92
Copy link

sunly92 commented Nov 23, 2024

Hi,

This is really a very good repo for learning stable diffusion from scratch. However, I found the missing scaling factor that should have been applied to latent $z$ before U-Net. It was said to keep the variance of the latent onto a unit circle which could facilitate training. A detailed discussion can be found at:
huggingface/diffusers#437

Cheers,
Liyan

@explainingai-code
Copy link
Owner

explainingai-code commented Nov 23, 2024

Hello @sunly92 ,

Thank you :)
You are right regarding the scaling factor not present, but this scaling is only used by the authors for VAE and not VQVAE.
You can see scale_by_std parameter defined in this config but not here
From paper - "Note that the VQ-regularized space has a variance close to 1, such that it does not have to be rescaled."
Since I only provided the code for VQVAE training in this repo, so this scaling was not required.

Interestingly, for both the datasets when I computed latent std, I ended up with a std close to 1(not just for vqvae but also for vae), so latent scaling would not have made any difference even for vae. Which is why even in other repos where I have also trained a vae, I decided to skip it rather than adding the logic to compute std for every batch item and do the scaling.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants