Skip to content

Latest commit

 

History

History
 
 

stable-diffusion-v-2-1

Stable Diffusion 2.1 for PyTorch

This directory provides scripts to perform text-to-image inference on a stable diffusion 2.1 model and is tested and maintained by Intel® Gaudi®. Before you get started, make sure to review the Supported Configuration.

For more information on training and inference of deep learning models using Intel Gaudi AI accelerator, refer to developer.habana.ai.

Table of Contents

Model Overview

This implementation is based on the following paper - High-Resolution Image Synthesis with Latent Diffusion Models.

How to Use

Users acknowledge and understand that the models referenced by Habana are mere examples for models that can be run on Gaudi. Users bear sole liability and responsibility to follow and comply with any third party licenses pertaining to such models, and Habana Labs disclaims and will bear no any warranty or liability with respect to users' use or compliance with such third party licenses.

Setup

Please follow the instructions provided in the Gaudi Installation Guide to set up the environment including the $PYTHON environment variable. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform Guide. The guides will walk you through the process of setting up your system to run the model on Gaudi.

Clone Intel Gaudi Model-References

In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version. You can run the hl-smi utility to determine the Intel Gaudi software version.

git clone -b [Intel Gaudi software version] https://github.com/HabanaAI/Model-References

Install Model Requirements

  1. In the docker container, go to the model directory:
cd Model-References/PyTorch/generative_models/stable-diffusion-v-2-1
  1. Install the required packages using pip.
pip install -r requirements.txt --user

Model Checkpoint

Text-to-Image

Download the pre-trained weights for 768x768 images (4.9GB)

wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.ckpt

and/or 512x512 images (4.9GB).

wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt

Inference and Examples

The following command generates a total of 3 images of size 768x768 and saves each sample individually as well as a grid of size n_iter x n_samples at the specified output location (default: outputs/txt2img-samples).

$PYTHON scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 --n_samples 1 --n_iter 3 --use_hpu_graph

To generate 3 images of a 512x512 size using a k-diffusion dpmpp_2m sampler with 35 steps, use the command:

$PYTHON scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_512-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference.yaml --H 512 --W 512 --n_samples 1 --n_iter 3 --steps 35 --k_sampler dpmpp_2m --use_hpu_graph

For a more detailed description of parameters, please use the following command to see a help message:

$PYTHON scripts/txt2img.py -h

Performance

The first two batches of images generate a performance penalty. All subsequent batches will be generated much faster.

Supported Configuration

Validated on Intel Gaudi Software Version PyTorch Version Mode
Gaudi 1.15.1 2.2.0 Inference
Gaudi 2 1.15.1 2.2.0 Inference

Changelog

1.8.0

Initial release.

1.10.0

Decreased host overhead to minimum by rewriting samplers and the main sampling loop.

Script Modifications

Major changes done to the original model from Stability-AI/stablediffusion repository:

  • Changed README.
  • Added HPU support.
  • Modified configs/stable-diffusion/v2-inference-v.yaml and configs/stable-diffusion/v2-inference.yaml
  • Changed code around einsum operation in ldm/modules/attention.py
  • randn moved to cpu in scripts/txt2img.py and ldm/models/diffusion/ddim.py
  • sampling is rewritten in an accelerator-friendly way

Known Issues

  • Initial random noise generation has been moved to CPU. Contrary to when noise is generated on Gaudi, CPU-generated random noise produces consistent output regardless of whether HPU Graphs API is used or not.
  • The model supports batch sizes up to 16 on Gaudi and up to 8 on Gaudi 2 for output images 512x512px, and batch size 1 for images 768x768px on Gaudi and Gaudi 2.