Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new quantization scheme #1100

Open
balditommaso opened this issue Nov 25, 2024 · 5 comments
Open

Add new quantization scheme #1100

balditommaso opened this issue Nov 25, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@balditommaso
Copy link

Do you think it's feasible to add Additive Power OF Two Quantization to Brevitas?

Even if it is known as non-uniform quantization technique, it is so HW friendly and it can help when a we need more flexibility of representation.

I can try to do it, I just would like to know what do you think!

@balditommaso balditommaso added the enhancement New feature or request label Nov 25, 2024
@Giuseppe5
Copy link
Collaborator

We already support Po2 quantization in Brevitas, sometimes referred internally as FixedPoint.

We have some presents here:
https://github.com/Xilinx/brevitas/blob/master/src/brevitas/quant/fixed_point.py

@balditommaso
Copy link
Author

I see, let me explain my situation. I have a weight distribution where most of the values are close to ZERO, I tried INT quantization, but the result was really poor, because most of the values fall in ZERO, therefore I tried the Fp8e4m3Mixin weight quantization, the performances increased of a ~20% which is nice, but looking at the distribution I think we can do something more.

Currently, I have two ideas:

  1. I can start to cut outliers using the percentile instead of the AbsMax
  2. Try different fixed-point representations

What do you think? Have you ever experienced something like that?

NOTE: I am using channel-wise quantization

@Giuseppe5
Copy link
Collaborator

I am assuming that you're using PTQ.

You can try to use some of our PTQ techniques like weight equalization, which should help remove outliers.

The idea is the following:

  • Start from a floating point network
  • Apply symbolic_trace
  • Apply weight_equalization
  • Apply quantization

This might take a bit of time to set-up and make sure that everything works as intended, because weight-equalization is not compatible with all network topologies.

There are also methods but honestly it is very application specific, and it could be worth reaching out offline to discuss those if you're still facing issues.

With respect to non-uniform quantization, Brevitas does not currently support it. Moreover, non-uniform quantization is generally not convenient for the use-case you mentioned in another PR since it might require dequantization to perform operations (that is not always true and it's algorithm specific)

@Giuseppe5
Copy link
Collaborator

Apologies for my miss-understanding about Power-of-Two quantization, for some reasons I completely missed the link.

In general we're looking into some Non-Uniform quantization options.
I will take a look to get an idea of easy/difficult it would be to implement in a framework that would support newer non-uniform algorithms as well, as Brevitas is built around composability.

If there's a relatively easy way to get that implemented and you're willing to contribute, I am happy to help in the process :)

@balditommaso
Copy link
Author

Amazing, I think it could be a nice new feature for Brevitas due to its flexibility and easy HW implementation.

Let me know what do you think ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants