Skip to content

Latest commit

 

History

History
92 lines (72 loc) · 3.09 KB

README.md

File metadata and controls

92 lines (72 loc) · 3.09 KB

Optimize Stable Diffusion Text-to-Image

Investigate optimization method for both inference speed and memory-consumption of StableDiffusion with 🤗 diffusers. You can see more details here.

Overview

Speed Benchmark: We provided notebook for benchmark

  • CPU Benchmark

    • Device: Intel(R) Xeon(R) CPU @ 2.00GHz (Google Colab)
    • Inference steps: 20 steps
    Pipeline DType Speed (s/image)
    Pytorch fp32 665.6
    Pytorch + TokenMerging fp32 542.2
    ONNX fp32 566.4
    ONNX uint8 455.8
    OpenVINO fp32 548.2

Note:

  • Right now we're just focusing on speed. The quality benchmarks of optimization model will be considered in the future.

Pytorch Pipeline

Half precision weights (Float16)

Load and run model directly in float16 (Only for GPU).

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "Zero-nnkn/stable-diffusion-2-pokemon",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

Token Merging

Token Merging for Stable Diffusion is a technique for transformers speedup by merging redundant tokens.

import torch
import tomesd
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "Zero-nnkn/stable-diffusion-2-pokemon",
).to("cuda")

# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here

Memory Efficient Attention

Use FlashAttention with xFormers (Only for GPU).

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "Zero-nnkn/stable-diffusion-2-pokemon",
    use_safetensors=True,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

# Disable
# pipe.disable_xformers_memory_efficient_attention()

Serialization

Convert model to ONNX and OpenVINO format. 🤗 Optimum provides pipeline compatible with ONNX runtime and OpenVINO. You can see ways to export StableDiffusionPipeline in export.ipynb

Example 1: Export pytorch pipeline to ONNX

from optimum.onnxruntime import ORTStableDiffusionPipeline

model_id = "Zero-nnkn/stable-diffusion-2-pokemon"
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True)
pipeline.save_pretrained("onnx")

Example 2: Export pytorch pipeline to OpenVINO. Note that this pipeline can only execute on Intel devices (CPU or GPU).

from optimum.intel import OVStableDiffusionPipeline

model_id = "Zero-nnkn/stable-diffusion-2-pokemon"
pipeline = OVStableDiffusionPipeline.from_pretrained(model_id, export=True)

pipeline.save_pretrained("openvino")

Other tools