Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using StableDiffusionControlNetImg2ImgPipeline Enable_vae_tiling(), seemingly fixed the patch is 512 x 512, where should I set the relevant parameters #9983

Closed
reaper19991110 opened this issue Nov 21, 2024 · 6 comments
Assignees

Comments

@reaper19991110
Copy link

pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe = pipe.to("cuda")
prompt = "a beautiful landscape photograph"
pipe.enable_vae_tiling()
@sayakpaul
Copy link
Member

SD's default resolution is 512x512.

You should pass height and width when calling the pipeline.

@SahilCarterr
Copy link
Contributor

You can adjust the code as following inorder to use height and width

from diffusers import StableDiffusionControlNetImg2ImgPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image
import numpy as np
import torch

import cv2
from PIL import Image

# download an image
image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
np_image = np.array(image)

# get canny image
np_image = cv2.Canny(np_image, 100, 200)
np_image = np_image[:, :, None]
np_image = np.concatenate([np_image, np_image, np_image], axis=2)
canny_image = Image.fromarray(np_image)

# load control net and stable diffusion v1-5
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained(
    "Lykon/dreamshaper-8", controlnet=controlnet, torch_dtype=torch.float16
)

# speed up diffusion process with faster scheduler and memory optimization
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

# generate image
generator = torch.manual_seed(0)
image = pipe(
    "futuristic-looking woman",
    generator=generator,
    image=image,
    height=640,
    width=640,
    control_image=canny_image,
).images[0]
image

Output

sd_control_net_512issue

@reaper19991110

@hlky
Copy link
Collaborator

hlky commented Nov 22, 2024

Patch size for tiled VAE is not directly configurable, the value is calculated from the VAE's config. Could be beneficial for low end GPU users to allow tile size to be configurable.

def tiled_decode(self, z: torch.Tensor, return_dict: bool = True) -> Union[DecoderOutput, torch.Tensor]:

self.tile_sample_min_size = self.config.sample_size
sample_size = (
self.config.sample_size[0]
if isinstance(self.config.sample_size, (list, tuple))
else self.config.sample_size
)
self.tile_latent_min_size = int(sample_size / (2 ** (len(self.config.block_out_channels) - 1)))
self.tile_overlap_factor = 0.25

@sayakpaul
Copy link
Member

Don't want to overwhelm you with requests but if you want to take it up, you're more than welcome to :)

Otherwise, pinging @DN6 and @a-r-r-o-w.

@a-r-r-o-w
Copy link
Member

Thanks for pinging! I actually have planned for rewriting our VAE tiling implementations to make them all look similar and more easily configurable. Mochi's VAE tiling implementation would be the point-of-reference for the refactors. #9903 is the first PR in the series of changes, and next would be Allegro. For the image VAE, we will have to introduce deprecation warnings so will take that one a bit slower

@reaper19991110
Copy link
Author

I found that the parameter tile_latent_min_size in the code is the Patch size for tiled VAE, and this value is set to 512 in the config.json file of the vae used. So directly modifying the parameters of config.json is the most straightforward way. @hlky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants