Investigate optimization method for both inference speed and memory-consumption of StableDiffusion with 🤗 diffusers. You can see more details here.
Speed Benchmark: We provided notebook for benchmark
-
CPU Benchmark
- Device: Intel(R) Xeon(R) CPU @ 2.00GHz (Google Colab)
- Inference steps: 20 steps
Pipeline DType Speed (s/image) Pytorch fp32 665.6 Pytorch + TokenMerging fp32 542.2 ONNX fp32 566.4 ONNX uint8 455.8 OpenVINO fp32 548.2
Note:
- Right now we're just focusing on speed. The quality benchmarks of optimization model will be considered in the future.
Load and run model directly in float16 (Only for GPU).
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"Zero-nnkn/stable-diffusion-2-pokemon",
torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")
Token Merging for Stable Diffusion is a technique for transformers speedup by merging redundant tokens.
import torch
import tomesd
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"Zero-nnkn/stable-diffusion-2-pokemon",
).to("cuda")
# Apply ToMe with a 50% merging ratio
tomesd.apply_patch(pipe, ratio=0.5) # Can also use pipe.unet in place of pipe here
Use FlashAttention with xFormers (Only for GPU).
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained(
"Zero-nnkn/stable-diffusion-2-pokemon",
use_safetensors=True,
).to("cuda")
pipe.enable_xformers_memory_efficient_attention()
# Disable
# pipe.disable_xformers_memory_efficient_attention()
Convert model to ONNX and OpenVINO format. 🤗 Optimum provides pipeline compatible with ONNX runtime and OpenVINO. You can see ways to export StableDiffusionPipeline
in export.ipynb
Example 1: Export pytorch pipeline to ONNX
from optimum.onnxruntime import ORTStableDiffusionPipeline
model_id = "Zero-nnkn/stable-diffusion-2-pokemon"
pipeline = ORTStableDiffusionPipeline.from_pretrained(model_id, export=True)
pipeline.save_pretrained("onnx")
Example 2: Export pytorch pipeline to OpenVINO. Note that this pipeline can only execute on Intel devices (CPU or GPU).
from optimum.intel import OVStableDiffusionPipeline
model_id = "Zero-nnkn/stable-diffusion-2-pokemon"
pipeline = OVStableDiffusionPipeline.from_pretrained(model_id, export=True)
pipeline.save_pretrained("openvino")