Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDIMScheduler (or DDPM) of StableDiffusion with 1000 num_inference_steps bugs out #10003

Open
HilaManor opened this issue Nov 23, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@HilaManor
Copy link

Describe the bug

There's something going on with the set_timesteps offset parameters on Stable Diffusion (1v4). The timesteps are set from 1->1000 instead from 0, and so it tries to index out of bounds

Reproduction

model_id = "CompVis/stable-diffusion-v1-4"
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler).to(device)
pipe("an image of a cat", num_inference_steps=1000)

gives
IndexError: index 1000 is out of bounds for dimension 0 with size 1000
because scheduler.timesteps gives: tensor([1000, 999, 998, 997, ..., 3, 2, 1], device='cuda:0')
instead of tensor([999, 998, 997, ..., 3, 2, 1, 0], device='cuda:0')

Logs

No response

System Info

ste the text below in your GitHub issue and FILL OUT the two last points.

  • 🤗 Diffusers version: 0.31.0
  • Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.31
  • Running on Google Colab?: No
  • Python version: 3.12.7
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.26.2
  • Transformers version: 4.46.3
  • Accelerate version: 1.1.1
  • PEFT version: not installed
  • Bitsandbytes version: not installed
  • Safetensors version: 0.4.5
  • xFormers version: 0.0.28.post3
  • Accelerator: NVIDIA RTX A6000, 49140 MiB

Who can help?

@yiyixuxu @asomoza @sayakpaul @DN6

@HilaManor HilaManor added the bug Something isn't working label Nov 23, 2024
@SahilCarterr
Copy link
Contributor

I think num_inference_steps=1000 doesn't make practical sense

  • Increasing the steps significantly increases the computation time without proportional quality improvement.
  • After a certain number of steps, additional steps won't improve the image significantly.
  • The DDIMScheduler is designed to work optimally with fewer steps (typically 20–150).
  • You might experiment with 100–150 steps if you're looking for marginal improvements, but 1000 is overkill.

Additionally you can go through tutorial here to improve quality of SD
@HilaManor

@hlky
Copy link
Contributor

hlky commented Nov 23, 2024

The scheduler config has steps_offset=1 compared to default 0, timestep_spacing isn't specified in the config and for DDIMScheduler the default is leading:

elif self.config.timestep_spacing == "leading":
step_ratio = self.config.num_train_timesteps // self.num_inference_steps
# creates integer timesteps by multiplying by ratio
# casting to int to avoid issues when num_inference_step is power of 3
timesteps = (np.arange(0, num_inference_steps) * step_ratio).round()[::-1].copy().astype(np.int64)
timesteps += self.config.steps_offset

from diffusers import DDIMScheduler

model_id = "CompVis/stable-diffusion-v1-4"
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler")
scheduler.set_timesteps(1000)
scheduler.timesteps
>>> tensor([1000,  999, ...,   2,    1])

alpha_prod_t = self.alphas_cumprod[timestep]

timestep_spacing="leading" needs steps_offset=0 to get 1000 steps

from diffusers import DDIMScheduler

model_id = "CompVis/stable-diffusion-v1-4"
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler", steps_offset=0)
scheduler.set_timesteps(1000)
scheduler.timesteps
>>> tensor([999, 998, ...,   1,   0])

Most schedulers use default timestep_spacing="linspace" which does work with 1000 steps and gives the same 1000 timesteps

from diffusers import DDIMScheduler

model_id = "CompVis/stable-diffusion-v1-4"
scheduler = DDIMScheduler.from_pretrained(model_id, subfolder="scheduler", timestep_spacing="linspace")
scheduler.set_timesteps(1000)
scheduler.timesteps
>>> tensor([999, 998, ...,   1,   0])

DDIM explicitly doesn't support > 1000 steps, other schedulers don't have this check

if num_inference_steps > self.config.num_train_timesteps:
raise ValueError(
f"`num_inference_steps`: {num_inference_steps} cannot be larger than `self.config.train_timesteps`:"
f" {self.config.num_train_timesteps} as the unet model trained with this scheduler can only handle"
f" maximal {self.config.num_train_timesteps} timesteps."
)

EulerDiscreteScheduler with timestep_spacing="linspace" for example works with > 1000 steps

scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler", timestep_spacing="linspace")
scheduler.set_timesteps(1111)
scheduler.timesteps
>>> tensor([9.9900e+02, 9.9810e+02, 9.9720e+02,  ..., 1.8000e+00, 9.0000e-01,
        0.0000e+00])

Inference tested for a few steps:

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "CompVis/stable-diffusion-v1-4"
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler", timestep_spacing="linspace")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler)
pipe("an image of a cat", num_inference_steps=1111)

There is a difference in casting for linspace and trailing:
Euler

if self.config.timestep_spacing == "linspace":
timesteps = np.linspace(
0, self.config.num_train_timesteps - 1, num_inference_steps, dtype=np.float32
)[::-1].copy()

DDIM
timesteps = (
np.linspace(0, self.config.num_train_timesteps - 1, num_inference_steps)
.round()[::-1]
.copy()
.astype(np.int64)
)

Euler
elif self.config.timestep_spacing == "trailing":
step_ratio = self.config.num_train_timesteps / self.num_inference_steps
# creates integer timesteps by multiplying by ratio
# casting to int to avoid issues when num_inference_step is power of 3
timesteps = (
(np.arange(self.config.num_train_timesteps, 0, -step_ratio)).round().copy().astype(np.float32)
)
timesteps -= 1

DDIM
elif self.config.timestep_spacing == "trailing":
step_ratio = self.config.num_train_timesteps / self.num_inference_steps
# creates integer timesteps by multiplying by ratio
# casting to int to avoid issues when num_inference_step is power of 3
timesteps = np.round(np.arange(self.config.num_train_timesteps, 0, -step_ratio)).astype(np.int64)
timesteps -= 1

leading appears to not work with > 1000 steps on other schedulers

@HilaManor
Copy link
Author

HilaManor commented Nov 23, 2024

@hlky Thanks for the detailed reply! However I'm still a bit confused what to actually do.
Trying to load StableDiffusionPipeline from pretrained with a scheduler loaded with steps_offset=0 leads to a warning that this is deprecated and then overrides with offset=1. So it keeps this bug there:

if hasattr(scheduler.config, "steps_offset") and scheduler.config.steps_offset != 1:
deprecation_message = (
f"The configuration file of this scheduler: {scheduler} is outdated. `steps_offset`"
f" should be set to 1 instead of {scheduler.config.steps_offset}. Please make sure "
"to update the config accordingly as leaving `steps_offset` might led to incorrect results"
" in future versions. If you have downloaded this checkpoint from the Hugging Face Hub,"
" it would be very nice if you could open a Pull request for the `scheduler/scheduler_config.json`"
" file"
)
deprecate("steps_offset!=1", "1.0.0", deprecation_message, standard_warn=False)
new_config = dict(scheduler.config)
new_config["steps_offset"] = 1
scheduler._internal_dict = FrozenDict(new_config)

@SahilCarterr Thank you for your comment, however num_inference_steps=1000 is practical for applications different from simple text-to-image generation. Additionally, DDPMScheduler (which is designed for 1000 steps), has this bug as well.
In any case, this is still an unexpected behavior that should be accounted for.

@hlky
Copy link
Contributor

hlky commented Nov 24, 2024

@HilaManor Thanks for checking this! You're right looks like step_offset=0 won't work. Could you test timestep_spacing="linspace" or timestep_spacing="trailing" with your use case, both create the same tensor([999, 998, 997, ..., 3, 2, 1, 0], device='cuda:0') that you're expecting at 1000 steps. Using min(timestep, self.config.num_train_timesteps - 1) to index alphas_cumprod could work or the check might need to be >= instead of >, at least for timestep_spacing="leading".

@sayakpaul
Copy link
Member

Cc: @yiyixuxu here.

@HilaManor
Copy link
Author

@hlky Thanks again for your comment. timestep_spacing="trailing" does solve this only for t=1000, however in my case I also sometimes need less steps, which then causes differences from leading:

leading for 10 steps: tensor([901, 801, 701, 601, 501, 401, 301, 201, 101,   1], device='cuda:0')
trailing for 10 steps: tensor([999, 899, 799, 699, 599, 499, 399, 299, 199,  99], device='cuda:0')

An interesting question is also is it expected for the last timestep to be 1 when setting less steps? I know that it calculates if noise should be added in the last step (and it doesn't in this case), however the t=1 denoiser shouldn't be expected to output x_0 directly theoretically.

@HilaManor
Copy link
Author

HilaManor commented Nov 27, 2024

I think the following bug is related:
in DPMSolverMultistepScheduler witht the same pretrained config, when using set_timesteps(1000) it creates a sigma array such that the middle of the array has the same sigmas, which casues nans from timestep500 onwards
self.sigmas = [,...,1.6401, 1.6346, 1.6291, 1.6237, 1.6183, 1.6183, 1.6129, 1.6075, 1.6022, 1.5968,...].

Using "leading" leads to a constant tensor of 1000 values of 0.0413 with a trailing 0. "trailing" I think may be close to the expected behaviour?

the reason is the timesteps vector- in leading it's just 0s, and in linspace the middle timestep is repeated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants