-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Modify the Unsampler node to generate the noised samples #2
Comments
Dr.Lt.Data pointed out that their "Inspire" node pack contains a node that spits out progress latents: This is essentially a batch of latents from the sampling process. It ought to be relatively easy to copy this approach into a new Unsampler node to get a batch of latents that have been progressively noised for our purposes here in DemoFusion land. |
Please see my latest commit. I added a Batch Unsampler node, which works just like the Unsampler node from ComfyUI Noise, except at each step of unsampling, it collects the intermediate latent. These are concatenated together into a batch and returned at the output of the node. You can then send them to VAEDecode and ImagePreview and see all the intermediate latents. I think that once we have all the intermediate latents, we can then start looking at the next step for DemoFusion, which is to progressively de-noise these latents step by step, mixing in a bit of the re-noised latents (z-prime in the paper) at each step. |
So my vision here is to have a few different nodes that combine to implement DemoFusion:
|
Something like that, anyhow. |
Hi, your Unsampler works like a charm. I did a little workflow to visualize it It's clearly noisyfying the latents step by step. As you pointed, the generation of the noise is here : https://github.com/deroberon/demofusion-comfyui/blob/74559c79da6e4353747e674525865c314d6e2efd/pipeline_demofusion_sdxl.py#L1004C13-L1004C13 Where the latents are calculated and updated to generate a x_{t-1} And the first thing we have to do, I guess, is to create an latent input in Demofusion Node, pass these latentes generated by Batch Unsampler to DemoFusionSDXLPipeline and them try to modify this part to use the latents generated by the node, right? |
I have created (but not pushed) a Batch KSampler node, which takes the batch of latents that have been unsampled and aims to de-noise them, following the DemoFusion approach generally but adding more comfy-like flexibility. For starters, all it does is iterate through the latents and sample each from i to steps. In other words, the first, noisiest latent (z_prime(T)) gets de-noised from 0 to steps. The second (z_prime(T-1)) gets de-noised from 1 to steps, etc. The batch sampler node ads a reverse switch so that the latents at the input can be flipped giving us the noisiest one first and then proceeding from there. Anyhow, once I get batch sampling working, I can add in the scaled mixing from the paper using their cosine decay function to gradually blend in less and less of the z_prime latent at each time step. We then need to figure out how to do the sliding window stuff at each step, which I think will be the hard part. |
BTW I also re-read the original diffusion paper and realized that you can add noise to a latent using simple math rather than calling out to a sampler. I'll try doing that as well because it will be exponentially faster than running the KSampler against a reversed set of sigmas. |
Status update - sorry, no commit just yet. I am just about finished applying the blending of z_prime during de-noising after the upscale. Just making it fast. |
I noticed that Unsampler wasn't really doing what we want. "Unsampling" is a VERY straightforward process. You are just adding normal noise to the original image in accordance with the sigmas schedule, which you can get from the model. This fast and efficient torch code takes a 4D latent batch, x, along with a sigmas tensor that you can get via
and applies the sigmas to the latent to generate a batch of progressively noised latents following the sigmas (noise schedule). You don't have to send the latent through a sampler at all and there's no point using Comfy's sampler code for this purpose. Unsampling doesn't rely on the prompt at all; it's the same for all LDM models. I do not know whether LCM works in the same way, but I suspect it does not, so be warned that this approach may not work with LCM.
|
OMG!! It's so amazing what you did in a few days!! And your workflow in the example folder demonstrate perfectly the usage of the Sampler and Unsampler. I've also tried with SDXL models and it kind of works. The roughness in the intermediate steps are amplified. I also create another workflow in the example folder, that compares the application on the KSampler with just the light denoise, so we can see the impact of the different techniques we are applying with respect to just upscale the latents and denoise it. |
BTW, it's an amazing image! |
Thanks for the many compliments. I can now appreciate why in the paper they talk about the first technique being insufficient because it produces grainy output. This is why they then do their "dilated sampling". But the paper isn't super clear as to how dilated sampling works. They talk about getting a series of "global" latent representations by de-noising a Gaussian-blurred version of several parts of the latent, which according to Figure 3 seem to be overlapping. I guess that's what is meant by "dilated". The number of global samples is set to s^2, where s is the scale factor (2, 3, 4, etc), so you start with four global latents and then the next scaling up, you have 8, etc. These global latents are de-noised and then mixed somehow with what they call "local" representations. I wonder if the local representations just means the z[i] that was just de-noised? Anyhow I'll read their code and take a crack at it. |
I also suspect that their technique may not be the best. I'd like to make a node that gives you lots of options and flexibility so that people can try out different things. |
following this with much enthusiasm... keep up the great work guys! |
In the DemoFusion pipeline code, they implement the paper's various stages, one of which, of course, is noising the image step by step to produce a set of z' latents:
https://github.com/deroberon/demofusion-comfyui/blob/74559c79da6e4353747e674525865c314d6e2efd/pipeline_demofusion_sdxl.py#L1004C13-L1004C13
In the ComfyUI world, we have the Unsampler node from https://github.com/BlenderNeko/ComfyUI_Noise, which does this but does not currently keep all the intermediate noised samples - it only gives you the final noised sample. In an effort to make this node more Comfy-ish, perhaps we can encourage @BlenderNeko to update the Unsampler node to optionally pump out a batch of latents representing all of the intermediate noising steps, rather than just the final noised sample. See https://github.com/BlenderNeko/ComfyUI_Noise/blob/f227455f930ad1b5766f1a76e1bbdb911adfb85c/nodes.py#L201
Perhaps we can hook into this callback function (https://github.com/BlenderNeko/ComfyUI_Noise/blob/f227455f930ad1b5766f1a76e1bbdb911adfb85c/nodes.py#L233C16-L233C16) and peel off the latest noised latent, adding it to an array that can then be used by the rest of the existing pipeline code?
The text was updated successfully, but these errors were encountered: