-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Guide] Quantize your Diffusion Models with bnb
#10012
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice start! 👏
I think you can combine this guide with the existing one here since there is quite a bit of overlap between the two. Here are some general tips for doing that:
- Keep the introduction in the existing guide but add a few sentences that adapts it to quantizing Flux.1-dev with bitsandbytes so you can run it on hardware with less than 16GB of memory. I think most users at this point have a general idea of what quantization is (and it is also covered in the getting started), so we don't need to spend more time on what it is/why it is important. The focus is more on bitsandbytes than quantization in general.
- I don't think it's necessary to have a section for showing how to use an unquantized model. Users are probably more eager to see how they can use a quantized model and getting them there as quickly as possible would be better.
- Combine the 8-bit quantization section with the existing one here. You can add here about how you're quantizing both the
T5EncoderModel
andFluxTransformer2DModel
, what thelow_cpu_mem_usage
anddevice_map
(if you have more than one GPU) parameter do. - You can do the same thing with the 4-bit section. Combine it with the existing one and add a few lines explaining the parameters.
- Combine the NF4 quantization section with the one here.
- Lead with the visualization in the method comparison section. Most users probably aren't too interested in comparing and running all this code themselves, so it's more impactful to lead with the results first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested some nits. Greatly agree with @stevhliu's comments and recommendations.
```python | ||
memory_allocated = torch.cuda.max_memory_allocated(0) / (1024 ** 3) | ||
print(f"GPU Memory Allocated: {memory_allocated:.2f} GB") | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reader, I'd like to know how much it was at this point.
Co-authored-by: Pedro Cuenca <[email protected]> Co-authored-by: Steven Liu <[email protected]>
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This PR adds a guide on quantization of diffusion models using
bnb
anddiffusers
Here is a colab notebook for easy code access
CC: @stevhliu @sayakpaul