New sampling method: Incremental Fine-tuning during sampling. #61

cloneofsimo · 2022-12-18T01:01:22Z

cloneofsimo
Dec 18, 2022
Maintainer

For a lack of a better name, let's call this scheduled adaptiveness. The basic trick goes like this: during sampling, start from base model, and approach the fine-tuned model. i.e., during $t$ step:

$$ W \leftarrow W_0 + m(t) A B^T $$

where $m$ is a monotonically increasing function. e.g.,

$$ m (t) = H_{\gamma}(t) $$

where H is heaviside step function

We can implement this with diffusers quite beautifully, using callback.

def cbk(step, timestep, latents):
    val = timestep.item() / 1000
    it = 1 - val  # it starts from 0 to 1
    tune = 0. if it < 0.3 else 1.0

    tune_lora_scale(pipe.unet, tune)
    tune_lora_scale(pipe.text_encoder, tune)

image = pipe(
    prompt,
    num_inference_steps=50,
    guidance_scale=7.5,
    callback=cbk,
    height=640,
    width=512,
    callback_steps=1,
).images[0]

Following image is with / without scheduled adaptiveness, with prompt

prompt = "a photo of <wday>, holding a red umbrella, wearing a white shirt and a sunglasses"

There were no white shirt in the training data. All clothes were black. So model can't distinguish if black cloth is something that is intrinsic to Wednesday. We know that this isn't the case trivially, but a gradient, it has no idea.
But base model (sd1.5) certainly knows this. So, it first creates obviously character with white shirt in first few steps. Right after few steps, LoRA comes in and plays it's role to make it closer to <wday>.

With schedule of 0-1, this is nearly equivalent to sampling with Base model and SDEditing with Fine-tuned one.

cloneofsimo · 2022-12-18T01:02:02Z

cloneofsimo
Dec 18, 2022
Maintainer Author

Also wrote a twitter thread about it. https://twitter.com/cloneofsimo/status/1604242844574552064

0 replies

cloneofsimo · 2022-12-18T01:06:01Z

cloneofsimo
Dec 18, 2022
Maintainer Author

Thing to note : this isn't something that is impossible previously, but you would need EXTREMELY large amount of memory space to pull this off. Clearly, "merging model" needs at least extra 4gb. This becomes infeasible if we are talking about something of 50 steps. That's 200GB! But luckily for LoRA, you can dynamically merge models so this takes near-zero overhead.

0 replies

ExponentialML · 2022-12-18T06:47:15Z

ExponentialML
Dec 18, 2022

Awesome work as always. Can this be done with a normal .ckpt file?

1 reply

cloneofsimo Dec 18, 2022
Maintainer Author

With conversion to diffusers, yes. Needs LoRA though. Conversion to Diffusers format can be found here: https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

brian6091 · 2022-12-18T09:15:10Z

brian6091
Dec 18, 2022
Collaborator

Super interesting! I'd be interested in comparing this to an image without scheduled apaptation, but with alpha > 1.0. Can we achieve similar results if alpha = 5, for example. I would guess that sweeping over alpha > 1 would get you an image that is close to that of scheduled adaptation (at least implemented with heaviside).

I'm not grokking why turning on the adaptation at a particular point is better than just cranking up the scale of adaptation. Do you have an intuitive explanation why scheduling can lead to superior generation?

4 replies

cloneofsimo Dec 18, 2022
Maintainer Author

Well, scheduling is strict generalization to constant scaling. So there is just more rooms to tweak. Probably "the best ones" depend on how much level of detail fine-tuned model focuses on.
For example, suppose you've fine-tuned your model $W$ with style-dataset. Generally, compositional information in the style dataset doesn't matter. So any kindof "initial steps" by $W$ might be overfitted to the style-dataset, which is not helpful. I think in this scenario Heaviside scheduling is "strictly better", but this claim is not validated.

cloneofsimo Dec 18, 2022
Maintainer Author

Overall, constant tuning $\subset$ adaptive scheduling, so just gives end user more stuff to play with

brian6091 Dec 18, 2022
Collaborator

Thanks, that helps! Seems like this is a wild space for playing, as it will depend on the scheduler (btw, which did you use?).

cloneofsimo Dec 18, 2022
Maintainer Author

Euler A, 50 steps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New sampling method: Incremental Fine-tuning during sampling. #61

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

New sampling method: Incremental Fine-tuning during sampling. #61

cloneofsimo Dec 18, 2022 Maintainer

Replies: 5 comments · 5 replies

cloneofsimo Dec 18, 2022 Maintainer Author

cloneofsimo Dec 18, 2022 Maintainer Author

ExponentialML Dec 18, 2022

cloneofsimo Dec 18, 2022 Maintainer Author

brian6091 Dec 18, 2022 Collaborator

cloneofsimo Dec 18, 2022 Maintainer Author

cloneofsimo Dec 18, 2022 Maintainer Author

brian6091 Dec 18, 2022 Collaborator

cloneofsimo Dec 18, 2022 Maintainer Author

cloneofsimo
Dec 18, 2022
Maintainer

Replies: 5 comments 5 replies

cloneofsimo
Dec 18, 2022
Maintainer Author

cloneofsimo
Dec 18, 2022
Maintainer Author

ExponentialML
Dec 18, 2022

cloneofsimo Dec 18, 2022
Maintainer Author

brian6091
Dec 18, 2022
Collaborator

cloneofsimo Dec 18, 2022
Maintainer Author

cloneofsimo Dec 18, 2022
Maintainer Author

brian6091 Dec 18, 2022
Collaborator

cloneofsimo Dec 18, 2022
Maintainer Author