This directory provides scripts to perform text-to-image inference on a stable diffusion 2.1 model and is tested and maintained by Intel® Gaudi®. Before you get started, make sure to review the Supported Configuration.
For more information on training and inference of deep learning models using Intel Gaudi AI accelerator, refer to developer.habana.ai.
- Model-References
- Model Overview
- Setup
- Model Checkpoint
- Inference and Examples
- Supported Configuration
- Changelog
- Known Issues
This implementation is based on the following paper - High-Resolution Image Synthesis with Latent Diffusion Models.
Users acknowledge and understand that the models referenced by Habana are mere examples for models that can be run on Gaudi. Users bear sole liability and responsibility to follow and comply with any third party licenses pertaining to such models, and Habana Labs disclaims and will bear no any warranty or liability with respect to users' use or compliance with such third party licenses.
Please follow the instructions provided in the Gaudi Installation Guide
to set up the environment including the $PYTHON
environment variable. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform Guide.
The guides will walk you through the process of setting up your system to run the model on Gaudi.
In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version.
You can run the hl-smi
utility to determine the Intel Gaudi software version.
git clone -b [Intel Gaudi software version] https://github.com/HabanaAI/Model-References
- In the docker container, go to the model directory:
cd Model-References/PyTorch/generative_models/stable-diffusion-v-2-1
- Install the required packages using pip.
pip install -r requirements.txt --user
Download the pre-trained weights for 768x768 images (4.9GB)
wget https://huggingface.co/stabilityai/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned.ckpt
and/or 512x512 images (4.9GB).
wget https://huggingface.co/stabilityai/stable-diffusion-2-1-base/resolve/main/v2-1_512-ema-pruned.ckpt
The following command generates a total of 3 images of size 768x768 and saves each sample individually as well as a grid of size n_iter
x n_samples
at the specified output location (default: outputs/txt2img-samples
).
$PYTHON scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_768-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference-v.yaml --H 768 --W 768 --n_samples 1 --n_iter 3 --use_hpu_graph
To generate 3 images of a 512x512 size using a k-diffusion dpmpp_2m sampler with 35 steps, use the command:
$PYTHON scripts/txt2img.py --prompt "a professional photograph of an astronaut riding a horse" --ckpt v2-1_512-ema-pruned.ckpt --config configs/stable-diffusion/v2-inference.yaml --H 512 --W 512 --n_samples 1 --n_iter 3 --steps 35 --k_sampler dpmpp_2m --use_hpu_graph
For a more detailed description of parameters, please use the following command to see a help message:
$PYTHON scripts/txt2img.py -h
The first two batches of images generate a performance penalty. All subsequent batches will be generated much faster.
Validated on | Intel Gaudi Software Version | PyTorch Version | Mode |
---|---|---|---|
Gaudi | 1.15.1 | 2.2.0 | Inference |
Gaudi 2 | 1.15.1 | 2.2.0 | Inference |
Initial release.
Decreased host overhead to minimum by rewriting samplers and the main sampling loop.
Major changes done to the original model from Stability-AI/stablediffusion repository:
- Changed README.
- Added HPU support.
- Modified configs/stable-diffusion/v2-inference-v.yaml and configs/stable-diffusion/v2-inference.yaml
- Changed code around einsum operation in ldm/modules/attention.py
- randn moved to cpu in scripts/txt2img.py and ldm/models/diffusion/ddim.py
- sampling is rewritten in an accelerator-friendly way
- Initial random noise generation has been moved to CPU. Contrary to when noise is generated on Gaudi, CPU-generated random noise produces consistent output regardless of whether HPU Graphs API is used or not.
- The model supports batch sizes up to 16 on Gaudi and up to 8 on Gaudi 2 for output images 512x512px, and batch size 1 for images 768x768px on Gaudi and Gaudi 2.