Optimize Stable Diffusion Text-to-Image

Investigate optimization method for both inference speed and memory-consumption of StableDiffusion with 🤗 diffusers. You can see more details here.

Overview

Speed Benchmark: We provided notebook for benchmark

CPU Benchmark
- Device: Intel(R) Xeon(R) CPU @ 2.00GHz (Google Colab)
- Inference steps: 20 steps
Pipeline DType Speed (s/image)

Pytorch fp32 665.6

Pytorch + TokenMerging fp32 542.2

ONNX fp32 566.4

ONNX uint8 455.8

OpenVINO fp32 548.2

Note:

Right now we're just focusing on speed. The quality benchmarks of optimization model will be considered in the future.

Pytorch Pipeline

Half precision weights (Float16)

Load and run model directly in float16 (Only for GPU).

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "Zero-nnkn/stable-diffusion-2-pokemon",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

Token Merging

Token Merging for Stable Diffusion is a technique for transformers speedup by merging redundant tokens.

import torch
import tomesd
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "Zero-nnkn/stable-diffusion-2-pokemon",
).to("cuda")

# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here

Memory Efficient Attention

Use FlashAttention with xFormers (Only for GPU).

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "Zero-nnkn/stable-diffusion-2-pokemon",
    use_safetensors=True,
).to("cuda")

pipe.enable_xformers_memory_efficient_attention()

# Disable
# pipe.disable_xformers_memory_efficient_attention()

Serialization

Convert model to ONNX and OpenVINO format. 🤗 Optimum provides pipeline compatible with ONNX runtime and OpenVINO. You can see ways to export StableDiffusionPipeline in export.ipynb

Example 1: Export pytorch pipeline to ONNX

from optimum.onnxruntime import ORTStableDiffusionPipeline

model_id = "Zero-nnkn/stable-diffusion-2-pokemon"
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True)
pipeline.save_pretrained("onnx")

Example 2: Export pytorch pipeline to OpenVINO. Note that this pipeline can only execute on Intel devices (CPU or GPU).

from optimum.intel import OVStableDiffusionPipeline

model_id = "Zero-nnkn/stable-diffusion-2-pokemon"
pipeline = OVStableDiffusionPipeline.from_pretrained(model_id, export=True)

pipeline.save_pretrained("openvino")

Other tools

Meta AITemplate
Microsoft DeepSpeed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Optimize Stable Diffusion Text-to-Image

Overview

Pytorch Pipeline

Half precision weights (Float16)

Token Merging

Memory Efficient Attention

Serialization

Other tools

Pipeline	DType	Speed (s/image)
Pytorch	fp32	665.6
Pytorch + TokenMerging	fp32	542.2
ONNX	fp32	566.4
ONNX	uint8	455.8
OpenVINO	fp32	548.2

Files

README.md

Latest commit

History

README.md

File metadata and controls

Optimize Stable Diffusion Text-to-Image

Overview

Pytorch Pipeline

Half precision weights (Float16)

Token Merging

Memory Efficient Attention

Serialization

Other tools