Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding U-Net for diffusion model #33

Open
hongkai-dai opened this issue Apr 17, 2023 · 6 comments
Open

Adding U-Net for diffusion model #33

hongkai-dai opened this issue Apr 17, 2023 · 6 comments
Assignees

Comments

@hongkai-dai
Copy link
Collaborator

It seems that most diffusion papers use U-Net (or U-Net with FiLM structure for conditional input) instead of MLP for the diffusion model. We can consider adding our own U-Net.

@hongkai-dai hongkai-dai self-assigned this Apr 17, 2023
@hjsuh94
Copy link
Owner

hjsuh94 commented Apr 17, 2023

My impression is that U-Net will only be useful when we deal with images. For vector data, I'm not sure how much it will provide inductive bias.

But we should definitely add it for pixel-domain examples!

@hongkai-dai
Copy link
Collaborator Author

Sounds good! I was checking Janner's code and saw that they use U-Net for their state/action pairs https://github.com/jannerm/diffuser/blob/main/diffuser/models/temporal.py.

You mind if I add some preliminary implementation on U-Net as an exercise? I am having some problem to fit a good score function with my MLP on the cart-pole example, so I am trying to debug what is happening. One candidate is to switch to a different network structure.

@hjsuh94
Copy link
Owner

hjsuh94 commented Apr 17, 2023

That sounds good to me! I think data stabilization is a good test to see if the score function was trained correctly.

I have also noticed that the score function is a bit fickle to train compared to standard regression.

@hongkai-dai
Copy link
Collaborator Author

Sorry what do you mean by data stabilization? Currently I test the learned score function by applying Lagenvin dynamics zₜ₊₁ = zₜ + ε/2*s_θ(zₜ)+√ε * noise, and see when I take many Langevin dynamics (like 1000 steps) does z look like coming from the training data distribution. Is that what you mean?

@hjsuh94
Copy link
Owner

hjsuh94 commented Apr 17, 2023

That's exactly right, although I've been simply using standard gradient descent!

@hongkai-dai
Copy link
Collaborator Author

Got it, thanks! I will try the version zₜ₊₁ = zₜ + ε/2*s_θ(zₜ) without the noise term, I think that corresponds to the standard gradient descent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants