Add new quantization scheme #1100

balditommaso · 2024-11-25T16:48:39Z

Do you think it's feasible to add Additive Power OF Two Quantization to Brevitas?

Even if it is known as non-uniform quantization technique, it is so HW friendly and it can help when a we need more flexibility of representation.

I can try to do it, I just would like to know what do you think!

Giuseppe5 · 2024-11-26T09:34:35Z

We already support Po2 quantization in Brevitas, sometimes referred internally as FixedPoint.

We have some presents here:
https://github.com/Xilinx/brevitas/blob/master/src/brevitas/quant/fixed_point.py

balditommaso · 2024-11-26T10:34:35Z

I see, let me explain my situation. I have a weight distribution where most of the values are close to ZERO, I tried INT quantization, but the result was really poor, because most of the values fall in ZERO, therefore I tried the Fp8e4m3Mixin weight quantization, the performances increased of a ~20% which is nice, but looking at the distribution I think we can do something more.

Currently, I have two ideas:

I can start to cut outliers using the percentile instead of the AbsMax
Try different fixed-point representations

What do you think? Have you ever experienced something like that?

NOTE: I am using channel-wise quantization

Giuseppe5 · 2024-11-26T12:05:43Z

I am assuming that you're using PTQ.

You can try to use some of our PTQ techniques like weight equalization, which should help remove outliers.

The idea is the following:

Start from a floating point network
Apply symbolic_trace
Apply weight_equalization
Apply quantization

This might take a bit of time to set-up and make sure that everything works as intended, because weight-equalization is not compatible with all network topologies.

There are also methods but honestly it is very application specific, and it could be worth reaching out offline to discuss those if you're still facing issues.

With respect to non-uniform quantization, Brevitas does not currently support it. Moreover, non-uniform quantization is generally not convenient for the use-case you mentioned in another PR since it might require dequantization to perform operations (that is not always true and it's algorithm specific)

Giuseppe5 · 2024-11-26T12:10:50Z

Apologies for my miss-understanding about Power-of-Two quantization, for some reasons I completely missed the link.

In general we're looking into some Non-Uniform quantization options.
I will take a look to get an idea of easy/difficult it would be to implement in a framework that would support newer non-uniform algorithms as well, as Brevitas is built around composability.

If there's a relatively easy way to get that implemented and you're willing to contribute, I am happy to help in the process :)

balditommaso · 2024-11-26T12:27:37Z

Amazing, I think it could be a nice new feature for Brevitas due to its flexibility and easy HW implementation.

Let me know what do you think ;)

balditommaso added the enhancement New feature or request label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new quantization scheme #1100

Add new quantization scheme #1100

balditommaso commented Nov 25, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

Add new quantization scheme #1100

Add new quantization scheme #1100

Comments

balditommaso commented Nov 25, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024