Gradio demo of text-to-image using 4-bit quantized Stable Diffusion 3.5 Large
Full documentation is available on Hugging Face: Stable Diffusion Text-to-image
-
Open a web browser, log in to Hugging Face and register your name and email, to use stable-diffusion-3.5-large
-
Create a new Hugging Face user access token, which will capture that you completed the registration form
-
Clone this repo to your machine and change into the directory for this demo:
cd ./stability-ai-toolkit/sd35-text-to-image-quantized-gradio
-
Set up the app in a Python virtual environment:
python -m venv <your_environment_name> source <your_environment_name>/bin/activate
-
Set your
HF_TOKEN
inside your virtual environmentexport HF_TOKEN=<Hugging Face user access token>
-
Install dependencies
pip install -r requirements.txt
NOTE: Read requirements.txt for MacOS PyTorch installation instructions
TL;DR:
# Inside your virtual environment pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
-
Start the app
python app.py
-
Open UI in a web browser: http://127.0.0.1:7861
import torch
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
...
model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)
pipe = StableDiffusion3Pipeline.from_pretrained(
model_id,
transformer=model_nf4,
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
import torch
from diffusers import StableDiffusion3Pipeline
...
model_id = "stabilityai/stable-diffusion-3.5-large"
pipe = StableDiffusion3Pipeline.from_pretrained(
model_id,
torch_dtype=torch.bfloat16
)
NOTE: There is a SIGNIFICANT IMPROVEMENT in NEGATIVE PROMPTING accuracy, when using 4-bit quantized Stable Diffusion 3.5 Large
Many use cases for Stable Diffusion 3.5 Large (SD3.5 L) require the algorithms of the model, without the large memory footprint:
- 4-bit quantization of SD3.5 L allows it to load onto GPUs with limited VRAM
- 4-bit quantization makes it easier to offload certain parts of model execution to the CPU, further reducing GPU memory usage
- There is often an acceptable decrease in generate image quality, with the benefit of a reduced cost due to reduced VRAM
- Users working on their own computer with a retail GPU (or Apple Silicon with an integrated GPU) would benefit from this use case
- Stable Diffusion 3.5 Medium (SD3.5 M) could alternatively be used as it has fewer parameters than Large and an inference speed that's even faster than quantized SD3.5 L