feat: add power of two scaling adapter for roundPBS #118

andrei-stoian-zama · 2023-08-01T14:39:03Z

Adds a pattern matcher for detecting the Power of Two scaling usage:
Fixes a bug in the onnx parser when an initalizer node is used by several nodes
Fixes a bug in the LLM notebook for reshape semantics
add tests with custom networks with compile_brevitas_qat
add tests when rounding is set manually and should not be overwritten
add tests for correctness -> not working well yet: https://github.com/zama-ai/concrete-ml-internal/issues/3946

Closes https://github.com/zama-ai/concrete-ml-internal/issues/3947
Closes https://github.com/zama-ai/concrete-ml-internal/issues/3946

andrei-stoian-zama · 2023-08-01T14:39:52Z

src/concrete/ml/sklearn/qnn_module.py

@@ -88,10 +93,11 @@ def __init__(

            quant_name = f"quant{idx}"
            quantizer = qnn.QuantIdentity(
-                bit_width=n_a_bits,
+                bit_width=8 if idx == 0 else n_a_bits,


Inputs quantized to 8 bits. Should only be used when by defautl for power of two scaling (roundPBS helps with this)

thanks for the explanation, maybe add comment in the code to explain this !

andrei-stoian-zama · 2023-08-01T14:40:24Z

use_case_examples/llm/utils.py

@@ -32,7 +32,7 @@ def max_fhe_relu(q_x, axis=-1, keepdims=True):
        if keepdims:
            shape = list(result.shape)
            shape.insert(axis, 1)
-            result = result.reshape(shape)
+            result = result.reshape(tuple(shape))


CP changed the semantics, they only accept tuples now

src/concrete/ml/onnx/ops_impl.py

RomanBredehoft · 2023-08-30T08:13:26Z

src/concrete/ml/pytest/torch_models.py

@@ -695,6 +697,7 @@ def __init__(self, n_classes, n_bits, n_active, signed, narrow) -> None:
            n_active (int): number of active (non-zero weight) neurons to keep
            signed (bool): whether quantized integer values are signed
            narrow (bool): whether the range of quantized integer values is narrow/symmetric
+            power_of_two_scaling (bool): whether to use power-of-two scaling quantizers


maybe add an additional sentence explaining what "power-of-two scaling" is / can be used for ?

src/concrete/ml/quantization/base_quantized_op.py

src/concrete/ml/pytest/torch_models.py

RomanBredehoft · 2023-08-30T08:20:08Z

src/concrete/ml/quantization/post_training.py

-            curr_inputs = {
-                input_name: node_results.get(input_name, None) for input_name in node.input
-            }
+            curr_inputs = [


I think this PR could be a good opportunity to maybe add additional comments for the following section, as it's an important part of the code that still remains somewhat obscure to whomever stumbles on it ! There are already some comments but I believe some steps are missing or others could be a bit more detailed

If you could pin-point what is not clear it would help.. I don't really know which parts to explain

RomanBredehoft · 2023-08-30T08:20:18Z

src/concrete/ml/quantization/post_training.py

@@ -604,6 +607,9 @@ def quantize_module(self, *calibration_data: numpy.ndarray) -> QuantizedModule:
            onnx_model=self.numpy_model.onnx_model,
        )

+        adapter = PowerOfTwoScalingRoundPBSAdapter(quantized_module)
+        adapter.process()


what is "process" ?

ok just saw the method below, but maybe add a comment here to briefly say what we do here ?

RomanBredehoft · 2023-08-30T08:20:47Z

src/concrete/ml/quantization/qat_quantizers.py

@@ -0,0 +1,25 @@
+"""Custom Quantiation Aware Training Brevitas quantizers."""


typo "Quantiation" -> "Quantization"

src/concrete/ml/quantization/qat_quantizers.py

src/concrete/ml/quantization/quantized_module_passes.py

RomanBredehoft · 2023-08-30T08:46:08Z

src/concrete/ml/quantization/quantized_module_passes.py

+                    the input value was an integer power of two
+            """
+            log2_value = int(numpy.rint(numpy.log2(value)))
+            if numpy.isclose(numpy.power(2.0, log2_value), value, atol=0.01):


is it not enough to just check that numpy.rint(numpy.log2(value)) == numpy.log2(value) ? or am I missing something ?

if not, then maybe make "atol" a parameter for integer_log2 (or at least say why it was set to 0.01)

the best practice is to never compare floats using equality. But you're right about the 0.01, it's quite arbitrary and not a good value for low powers of two (it's too big when the power is say -7). I'll use rtol instead

RomanBredehoft · 2023-08-30T08:52:41Z

tests/torch/test_brevitas_qat.py

+    else:
+        pass
+
+    # y_pred_clear_round = model.predict(x_test, fhe="disable")


is the following supposed to be removed ?

I added it back but correctness is not achieved, I'd rather push it as is and work on it later. I created an issue on it

RomanBredehoft

huge work, thanks a lot ! I have several comments (mostly things about detailing a bit the steps with comments)

Also, is it expected that apidocs were generated in this PR ? we'll update them before releasing anyway !

RomanBredehoft · 2023-08-31T15:07:20Z

src/concrete/ml/quantization/post_training.py


            # Constant inputs
            curr_cst_inputs: Dict[int, ONNXOpInputOutputType] = {}
-            for input_idx, (input_name, value) in enumerate(curr_inputs.items()):
+            for input_idx, (input_name, value) in enumerate(curr_inputs):


what does this section (the for loop) do overall ?

and then it's not very clear of what each if/else (in the loop) does / why they are like this

RomanBredehoft · 2023-08-31T15:09:59Z

src/concrete/ml/quantization/post_training.py

-                curr_inputs[input_name] for input_name in variable_input_names
+                input_data
+                for input_name, input_data in curr_inputs
+                if input_name in variable_input_names
            )

            # For mypy


below this ( I can't comment further here) :

why casting is need : curr_calibration_data = cast(Tuple[numpy.ndarray], curr_calibration_data)

"# Find the unique integer producers of the current's op output tensor" what is a producer ? how are they used ?

RomanBredehoft · 2023-08-31T15:10:41Z

src/concrete/ml/quantization/post_training.py

@@ -455,10 +456,12 @@ def _quantize_layers(self, *input_calibration_data: numpy.ndarray):
            has_variable_inputs = (len(curr_inputs) - len(curr_cst_inputs)) > 0

            variable_input_names = [
-                input_name for input_name in curr_inputs if input_name not in constants
+                input_name for input_name, _ in curr_inputs if input_name not in constants
            ]
            curr_calibration_data = tuple(


basically we are only interested in data from variables right ?

fd0r · 2023-08-31T15:17:21Z

Conformance issue.

RomanBredehoft

unblocking my request for changes, fine for me if you take care of the remaining comments in a following PR !

jfrery · 2023-09-04T08:11:15Z

Should we also update a notebook to make sure there is a speed up? Would be great to add a definition about this quantizer. I.e. what it does and what speed up is expected. I suppose the gain would come from using rounding instead of a standard PBS to re-quantize?

fd0r · 2023-09-04T08:52:58Z

That's a very long CI, unsure what to do here 🤔

fd0r

CI is taking way to long.
Let's not merge it to main as is as it would slow down other PRs

github-actions · 2023-09-08T11:59:17Z

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    5901      0   100%

50 files skipped due to complete coverage.

jfrery · 2023-09-08T12:38:35Z

docs/advanced_examples/FullyConnectedNeuralNetwork.ipynb

-    "    \"module__activation_function\": nn.Sigmoid,\n",
+    "    \"module__activation_function\": nn.ReLU,\n",


Why was this needed? Changing the architecture might have a positive impact on the accuracy which is not really what we want in this PR.

I believe this is because the power of two scaling feature only works with ReLU

is there an assert, if power of two is activated, that the activation is a relu?

Got it. But I don't see anything that prevents the use of sigmoid or other non linear function. What would happen then?

docs/advanced_examples/FullyConnectedNeuralNetwork.ipynb

use_case_examples/llm/utility_functions.py

RomanBredehoft

looks great, thanks a lot for this feature !!

jfrery · 2023-09-08T13:00:50Z

src/concrete/ml/sklearn/qnn_module.py

-        n_accum_bits: int = MAX_BITWIDTH_BACKWARD_COMPATIBLE,
+        n_w_bits: int = 4,
+        n_a_bits: int = 4,
+        # No pruning by default as roundPBS keeps the PBS precision low


What' is the chosen bit width to round down?

it is determined from the learned quantization scales

So these hyper parameters are useless? I am trying to understand what is the number of bits that's used to round?

jfrery · 2023-09-11T09:13:03Z

src/concrete/ml/sklearn/qnn_module.py

        n_prune_neurons_percentage: float = 0.0,
        activation_function: Type = nn.ReLU,
        quant_narrow: bool = False,
        quant_signed: bool = True,
+        power_of_two_scaling: bool = True,  # Default to true: use roundPBS to speed up the NNs


It's a bit weird here how rounding is implicitly contained in the PoT feature. Shouldn't there be 2 distinct features?

fd0r

Approved but some experiments/documentation should be added to explain to the user why and how this works.

cla-bot bot added the cla-signed label Aug 1, 2023

andrei-stoian-zama commented Aug 1, 2023

View reviewed changes

andrei-stoian-zama marked this pull request as ready for review August 8, 2023 13:11

andrei-stoian-zama requested a review from a team as a code owner August 8, 2023 13:11

fd0r marked this pull request as draft August 21, 2023 13:46

fd0r force-pushed the feat/add_qnn_power_of_two branch 3 times, most recently from 379a5f8 to ff87d8b Compare August 25, 2023 08:11

feat: add power of two scaling adapter for roundPBS

226b37c

andrei-stoian-zama force-pushed the feat/add_qnn_power_of_two branch from ff87d8b to 226b37c Compare August 28, 2023 09:44

andrei-stoian-zama added 3 commits August 29, 2023 13:47

chore: fix pcc

b715e07

chore: more tests

67750f9

chore: make pcc happy

0f8377d

andrei-stoian-zama marked this pull request as ready for review August 30, 2023 07:44