From 346f997477ce7cf1e8cc343d8a043acc13e67a57 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Thu, 13 Jun 2024 13:32:14 +0200
Subject: [PATCH 01/17] docs: add error message explanation

---
 docs/deep-learning/fhe_assistant.md | 31 ++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index 93d198675..7f9a91f15 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -52,14 +52,37 @@ concrete_clf.fit(X, y)
 
 concrete_clf.compile(X, debug_config)
 ```
+## Common compilation errors
 
-## Compilation error debugging
-
-Compilation errors that signal that the ML model is not FHE compatible are usually of two types:
+The most common compilation errors stem from the following causes:
 
 1. TLU input maximum bit-width is exceeded
+
+This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
+ - Reduce quantization `n_bits`. However, this may reduce accuracy. When quantization `n_bits` must be below 6, it is best to use [Quantization Aware Training](../deep-learning/fhe_friendly_models.md). 
+ - Use `rounding_threshold_bits`. This feature is described [here](../explanations/advanced_features.md#rounded-activations-and-quantizers). It is recommended to use the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) setting, and set the rounding bits to 1 or 2 bits higher than the quantization `n_bits`
+ - Use [pruning](../explanations/pruning.md)
+
 1. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler
 
+This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function. The solutions in this case are similar to the ones for the previous error.
+
+1. Quantization failed with `Could not determine a unique scale for the quantization!`. 
+
+This error is a related to the concatenation operator. When using quantization aware training with Brevitas the following approach will fix this error:
+
+  1. Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`.
+  2. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated:
+
+<!--pytest-codeblocks:skip-->
+```python
+torch.cat([self.quant_concat(x), self.quant_concat(y)])
+```
+
+## Debugging compilation errors
+
+Compilation errors due to FHE incompatible models, such as maximum bit-width exceeded or `NoParametersFound` can be debugged by examining the bit-widths associated with various intermediate values of the FHE computation.
+
 The following produces a neural network that is not FHE-compatible:
 
 ```python
@@ -116,6 +139,8 @@ Function you are trying to compile cannot be compiled:
 
 The error `this 17-bit value is used as an input to a table lookup` indicates that the 16-bit limit on the input of the Table Lookup (TLU) operation has been exceeded. To pinpoint the model layer that causes the error, Concrete ML provides the [bitwidth_and_range_report](../references/api/concrete.ml.quantization.quantized_module.md#method-bitwidth_and_range_report) helper function. First, the model must be compiled so that it can be [simulated](fhe_assistant.md#simulation).
 
+On the other hand, `NoParametersFound` is encountered when using `rounding_threshold_bits`. When using this setting, the 16-bit accumulator limit is relaxed. However, reducing bit-width, or reducing the `rounding_threshold_bits`, or using  using the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) rounding method can help.
+
 ### Fixing compilation errors
 
 To make this network FHE-compatible one can apply several techniques:

From becad109a97ae8b5f54531190f5f1f0340214f85 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Thu, 13 Jun 2024 15:39:49 +0200
Subject: [PATCH 02/17] docs: improve torch support explanations

---
 docs/deep-learning/fhe_assistant.md | 39 +++++++++---
 docs/deep-learning/torch_support.md | 99 ++++++++++++++---------------
 src/concrete/ml/onnx/ops_impl.py    |  4 +-
 3 files changed, 78 insertions(+), 64 deletions(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index 7f9a91f15..d3b0b67e8 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -52,33 +52,52 @@ concrete_clf.fit(X, y)
 
 concrete_clf.compile(X, debug_config)
 ```
+
 ## Common compilation errors
 
 The most common compilation errors stem from the following causes:
 
-1. TLU input maximum bit-width is exceeded
+#### 1. TLU input maximum bit-width is exceeded
 
 This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
- - Reduce quantization `n_bits`. However, this may reduce accuracy. When quantization `n_bits` must be below 6, it is best to use [Quantization Aware Training](../deep-learning/fhe_friendly_models.md). 
- - Use `rounding_threshold_bits`. This feature is described [here](../explanations/advanced_features.md#rounded-activations-and-quantizers). It is recommended to use the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) setting, and set the rounding bits to 1 or 2 bits higher than the quantization `n_bits`
- - Use [pruning](../explanations/pruning.md)
 
-1. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler
+- Reduce quantization `n_bits`. However, this may reduce accuracy. When quantization `n_bits` must be below 6, it is best to use [Quantization Aware Training](../deep-learning/fhe_friendly_models.md).
+- Use `rounding_threshold_bits`. This feature is described [here](../explanations/advanced_features.md#rounded-activations-and-quantizers). It is recommended to use the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) setting, and set the rounding bits to 1 or 2 bits higher than the quantization `n_bits`
+- Use [pruning](../explanations/pruning.md)
+
+#### 2. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler
 
 This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function. The solutions in this case are similar to the ones for the previous error.
 
-1. Quantization failed with `Could not determine a unique scale for the quantization!`. 
+#### 3. Quantization import failed with
+
+The error associated is `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
+
+This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
+
+A common example is related to the concatenation operator. Suppose two tensors `x` and `y` are produced by two layers and need to be concatenated:
+
+<!--pytest-codeblocks:skip-->
+
+```python
+x = self.dense1(x)
+y = self.dense2(y)
+z = torch.cat([x, y])
+```
 
-This error is a related to the concatenation operator. When using quantization aware training with Brevitas the following approach will fix this error:
+In the example above, the `x` and `y` layers need quantization before being concatenated. When using quantization aware training with Brevitas the following approach will fix this error:
 
-  1. Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`.
-  2. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated:
+1. Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`.
+1. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated:
 
 <!--pytest-codeblocks:skip-->
+
 ```python
-torch.cat([self.quant_concat(x), self.quant_concat(y)])
+z = torch.cat([self.quant_concat(x), self.quant_concat(y)])
 ```
 
+The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale.
+
 ## Debugging compilation errors
 
 Compilation errors due to FHE incompatible models, such as maximum bit-width exceeded or `NoParametersFound` can be debugged by examining the bit-widths associated with various intermediate values of the FHE computation.
diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index d63754114..c6a16730f 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -2,14 +2,21 @@
 
 In addition to the built-in models, Concrete ML supports generic machine learning models implemented with Torch, or [exported as ONNX graphs](onnx_support.md).
 
-As [Quantization Aware Training (QAT)](../explanations/quantization.md) is the most appropriate method of training neural networks that are compatible with [FHE constraints](../getting-started/concepts.md#model-accuracy-considerations-under-fhe-constraints), Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch.
+There are two approaches to build [FHE-compatible deep networks](../getting-started/concepts.md#model-accuracy-considerations-under-fhe-constraints):
 
-The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
+- [Quantization Aware Training (QAT)](../explanations/quantization.md) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch. To use this mode, compile models using `compile_brevitas_qat_model`
+- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. To use this mode, compile models with `compile_torch_model`.
+
+Both approaches should be used with the `rounding_threshold_bits` parameter set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.
 
 {% hint style="info" %}
-Converting neural networks to use FHE can be done with `compile_brevitas_qat_model` or with `compile_torch_model` for post-training quantization. If the model can not be converted to FHE two types of errors can be raised: (1) crypto-parameters can not be found and, (2) table look-up bit-width limit is exceeded. See the [debugging section](fhe_assistant.md#compilation-error-debugging) if you encounter these errors.
+**See the [common compilation errors page](./fhe_assistant.md#common-compilation-errors) for an explanation of some error messages that the compilation function may raise.**
 {% endhint %}
 
+## Quantization-aware training
+
+The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
+
 ```python
 import brevitas.nn as qnn
 import torch.nn as nn
@@ -51,38 +58,52 @@ torch_model = QATSimpleNet(30)
 quantized_module = compile_brevitas_qat_model(
     torch_model, # our model
     torch_input, # a representative input-set to be used for both quantization and compilation
+    rounding_threshold_bits={"n_bits": 6, "method": "approximate"}
 )
 
 ```
 
-## Configuring quantization parameters
+## Post-training quantization
 
-The PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
+The following example uses a simple PyTorch model that implements a fully connected neural network with two hidden layers. The model is compiled to use FHE using `compile_torch_model`.
+
+```python
+import torch.nn as nn
+import torch
 
-The following configurations were determined through experimentation for convolutional and dense layers.
+N_FEAT = 12
+n_bits = 6
 
-| target accumulator bit-width | activation bit-width | weight bit-width | number of active neurons |
-| ---------------------------- | -------------------- | ---------------- | ------------------------ |
-| 8                            | 3                    | 3                | 80                       |
-| 10                           | 4                    | 3                | 90                       |
-| 12                           | 5                    | 5                | 110                      |
-| 14                           | 6                    | 6                | 110                      |
-| 16                           | 7                    | 6                | 120                      |
+class PTQSimpleNet(nn.Module):
+    def __init__(self, n_hidden):
+        super().__init__()
 
-Using the templates above, the probability of obtaining the target accumulator bit-width, for a single layer, was determined experimentally by training 10 models for each of the following data-sets.
+        self.fc1 = nn.Linear(N_FEAT, n_hidden)
+        self.fc2 = nn.Linear(n_hidden, n_hidden)
+        self.fc3 = nn.Linear(n_hidden, 2)
 
-| <p>probability of obtaining<br>the accumulator bit-width</p> | 8   | 10   | 12  | 14  | 16   |
-| ------------------------------------------------------------ | --- | ---- | --- | --- | ---- |
-| mnist,fashion                                                | 72% | 100% | 72% | 85% | 100% |
-| cifar10                                                      | 88% | 88%  | 75% | 75% | 88%  |
-| cifar100                                                     | 73% | 88%  | 61% | 66% | 100% |
+    def forward(self, x):
+        x = torch.relu(self.fc1(x))
+        x = torch.relu(self.fc2(x))
+        x = self.fc3(x)
+        return x
+
+from concrete.ml.torch.compile import compile_torch_model
+import numpy
+
+torch_input = torch.randn(100, N_FEAT)
+torch_model = PTQSimpleNet(5)
+quantized_module = compile_torch_model(
+    torch_model, # our model
+    torch_input, # a representative input-set to be used for both quantization and compilation
+    n_bits=6,
+    rounding_threshold_bits={"n_bits": 6, "method": "approximate"}
+)
+```
 
-Note that the accuracy on larger data-sets, when the accumulator size is low, is also reduced strongly.
+## Configuring quantization parameters
 
-| <p>accuracy for target<br>accumulator bit-width</p> | 8   | 10  | 12  | 14  | 16  |
-| --------------------------------------------------- | --- | --- | --- | --- | --- |
-| cifar10                                             | 20% | 37% | 89% | 90% | 90% |
-| cifar100                                            | 6%  | 30% | 67% | 69% | 69% |
+The PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
 
 ## Running encrypted inference
 
@@ -100,7 +121,7 @@ In this example, the input values `x_test` and the predicted values `y_pred` are
 
 ## Simulated FHE Inference in the clear
 
-The user can also perform the inference on clear data. Two approaches exist:
+One can perform the inference on clear data in order to evaluate the impact of quantization and of FHE computation on the accuracy of their model. See [this section](../deep-learning/fhe_assistant.md#simulation) for more details. Two approaches exist:
 
 - `quantized_module.forward(quantized_x, fhe="simulate")`: simulates FHE execution taking into account Table Lookup errors.\
   De-quantization must be done in a second step as for actual FHE execution. Simulation takes into account the `p_error`/`global_p_error` parameters
@@ -110,34 +131,6 @@ The user can also perform the inference on clear data. Two approaches exist:
 FHE simulation allows to measure the impact of the Table Lookup error on the model accuracy. The Table Lookup error can be adjusted using `p_error`/`global_p_error`, as described in the [approximate computation ](../explanations/advanced_features.md#approximate-computations)section.
 {% endhint %}
 
-## Generic Quantization Aware Training import
-
-While the example above shows how to import a Brevitas/PyTorch model, Concrete ML also provides an option to import generic QAT models implemented in PyTorch or through ONNX. Deep learning models made with TensorFlow or Keras should be usable by preliminary converting them to ONNX.
-
-QAT models contain quantizers in the PyTorch graph. These quantizers ensure that the inputs to the Linear/Dense and Conv layers are quantized.
-
-Suppose that `n_bits_qat` is the bit-width of activations and weights during the QAT process. To import a PyTorch QAT network, you can use the [`compile_torch_model`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) library function, passing `import_qat=True`:
-
-<!--pytest-codeblocks:skip-->
-
-```python
-from concrete.ml.torch.compile import compile_torch_model
-n_bits_qat = 3
-
-quantized_module = compile_torch_model(
-    torch_model,
-    torch_input,
-    import_qat=True,
-    n_bits=n_bits_qat,
-)
-```
-
-Alternatively, if you want to import an ONNX model directly, please see [the ONNX guide](onnx_support.md). The [`compile_onnx_model`](../references/api/concrete.ml.torch.compile.md#function-compile_onnx_model) also supports the `import_qat` parameter.
-
-{% hint style="warning" %}
-When importing QAT models using this generic pipeline, a representative calibration set should be given as quantization parameters in the model need to be inferred from the statistics of the values encountered during inference.
-{% endhint %}
-
 ## Supported operators and activations
 
 Concrete ML supports a variety of PyTorch operators that can be used to build fully connected or convolutional neural networks, with normalization and activation layers. Moreover, many element-wise operators are supported.
diff --git a/src/concrete/ml/onnx/ops_impl.py b/src/concrete/ml/onnx/ops_impl.py
index b27557b8f..0b808b1a9 100644
--- a/src/concrete/ml/onnx/ops_impl.py
+++ b/src/concrete/ml/onnx/ops_impl.py
@@ -26,7 +26,9 @@
 
 
 class RawOpOutput(numpy.ndarray):
-    """Type construct that marks an ndarray as a raw output of a quantized op."""
+    """Type construct that marks an ndarray as a raw output of a quantized op.
+    A raw output is an output that is a clear constant such as a shape, a constant float, an index..
+    """
 
 
 # This function is only used for comparison operators that return boolean values by default.

From 663c8c2dec71443ac49b40c607a315a68f55123c Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Fri, 14 Jun 2024 15:56:30 +0200
Subject: [PATCH 03/17] fix: review changes

---
 docs/deep-learning/fhe_assistant.md | 30 ++++++++++++++++++-----------
 docs/deep-learning/torch_support.md | 14 +++++++++++---
 2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index d3b0b67e8..07932550b 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -59,21 +59,29 @@ The most common compilation errors stem from the following causes:
 
 #### 1. TLU input maximum bit-width is exceeded
 
-This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
+**Error message**: `this [N]-bit value is used as an input to a table lookup`
+
+**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
+
+**Possible solutions**:
 
 - Reduce quantization `n_bits`. However, this may reduce accuracy. When quantization `n_bits` must be below 6, it is best to use [Quantization Aware Training](../deep-learning/fhe_friendly_models.md).
 - Use `rounding_threshold_bits`. This feature is described [here](../explanations/advanced_features.md#rounded-activations-and-quantizers). It is recommended to use the [`fhe.Exactness.APPROXIMATE`](../references/api/concrete.ml.torch.compile.md#function-compile_torch_model) setting, and set the rounding bits to 1 or 2 bits higher than the quantization `n_bits`
 - Use [pruning](../explanations/pruning.md)
 
-#### 2. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler
+#### 2. No crypto-parameters can be found
+
+**Error message**: `RuntimeError: NoParametersFound` is raised by the compiler
 
-This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function. The solutions in this case are similar to the ones for the previous error.
+**Cause**: This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function.
 
-#### 3. Quantization import failed with
+**Possible solutions**: The solutions in this case are similar to the ones for the previous error.
 
-The error associated is `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
+#### 3. Quantization import failed
 
-This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
+**Error message**: `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
+
+**Cause**: This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
 
 A common example is related to the concatenation operator. Suppose two tensors `x` and `y` are produced by two layers and need to be concatenated:
 
@@ -85,10 +93,12 @@ y = self.dense2(y)
 z = torch.cat([x, y])
 ```
 
-In the example above, the `x` and `y` layers need quantization before being concatenated. When using quantization aware training with Brevitas the following approach will fix this error:
+In the example above, the `x` and `y` layers need quantization before being concatenated.
+
+**Possible solutions**:
 
-1. Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`.
-1. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated:
+1. If the error occurs for the first layer of the model: Add a  `QuantIdentity` layer in your model and apply it on the input of the `forward` function, before the first layer is computed.
+1. If the error occurs for a concatenation or addition layer: Add a new `QuantIdentity` layer in your model. Suppose it is called `quant_concat`. In the `forward` function, before concatenation of `x` and `y`, apply it to both tensors that are concatenated. The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale:
 
 <!--pytest-codeblocks:skip-->
 
@@ -96,8 +106,6 @@ In the example above, the `x` and `y` layers need quantization before being conc
 z = torch.cat([self.quant_concat(x), self.quant_concat(y)])
 ```
 
-The usage of a common `Quantidentity` layer to quantize both tensors that are concatenated ensures that they have the same scale.
-
 ## Debugging compilation errors
 
 Compilation errors due to FHE incompatible models, such as maximum bit-width exceeded or `NoParametersFound` can be debugged by examining the bit-widths associated with various intermediate values of the FHE computation.
diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index c6a16730f..f6f3aabd9 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -5,7 +5,7 @@ In addition to the built-in models, Concrete ML supports generic machine learnin
 There are two approaches to build [FHE-compatible deep networks](../getting-started/concepts.md#model-accuracy-considerations-under-fhe-constraints):
 
 - [Quantization Aware Training (QAT)](../explanations/quantization.md) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch. To use this mode, compile models using `compile_brevitas_qat_model`
-- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. To use this mode, compile models with `compile_torch_model`.
+- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with `compile_torch_model`.
 
 Both approaches should be used with the `rounding_threshold_bits` parameter set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.
 
@@ -15,7 +15,7 @@ Both approaches should be used with the `rounding_threshold_bits` parameter set
 
 ## Quantization-aware training
 
-The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy.
+The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy. To use QAT, Brevitas `QuantIdentity` nodes must be inserted in the PyTorch model, including one that quantizes the input of the `forward` function.
 
 ```python
 import brevitas.nn as qnn
@@ -63,6 +63,10 @@ quantized_module = compile_brevitas_qat_model(
 
 ```
 
+{% hint style="warning" %}
+If `QuantIdentity` layers are missing for any input or intermediate value, the compile function will raise an error. See the [common compilation errors page](./fhe_assistant.md#common-compilation-errors) for an explanation.
+{% endhint %}
+
 ## Post-training quantization
 
 The following example uses a simple PyTorch model that implements a fully connected neural network with two hidden layers. The model is compiled to use FHE using `compile_torch_model`.
@@ -103,7 +107,11 @@ quantized_module = compile_torch_model(
 
 ## Configuring quantization parameters
 
-The PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
+With QAT, the PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. When using this mode, set `n_bits=None` in the `compile_brevitas_qat_model`.
+
+With PTQ, the user needs to set the `n_bits` value in the `compile_torch_model` function. A trade-off between accuracy and FHE compatibility and latency must be determined manually.
+
+The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
 
 ## Running encrypted inference
 

From 2dcacd54394223aef908d5eef17077dc5fc00e8b Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Fri, 14 Jun 2024 16:08:54 +0200
Subject: [PATCH 04/17] fix: review changes

---
 docs/deep-learning/fhe_assistant.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index 07932550b..6a6fe1783 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -81,7 +81,7 @@ The most common compilation errors stem from the following causes:
 
 **Error message**: `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
 
-**Cause**: This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) for a guide on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
+**Cause**: This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
 
 A common example is related to the concatenation operator. Suppose two tensors `x` and `y` are produced by two layers and need to be concatenated:
 

From b118fde14978980b08fea72c61716708cd7064be Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Fri, 14 Jun 2024 18:16:37 +0200
Subject: [PATCH 05/17] fix: spacing

---
 src/concrete/ml/onnx/ops_impl.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/concrete/ml/onnx/ops_impl.py b/src/concrete/ml/onnx/ops_impl.py
index 0b808b1a9..e88ebff8e 100644
--- a/src/concrete/ml/onnx/ops_impl.py
+++ b/src/concrete/ml/onnx/ops_impl.py
@@ -27,6 +27,7 @@
 
 class RawOpOutput(numpy.ndarray):
     """Type construct that marks an ndarray as a raw output of a quantized op.
+
     A raw output is an output that is a clear constant such as a shape, a constant float, an index..
     """
 

From 65ba5644907bfe41ff6daaabb483a9aa1e9b4d79 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:55:10 +0200
Subject: [PATCH 06/17] fix: Update docs/deep-learning/torch_support.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index f6f3aabd9..383497a99 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -129,7 +129,7 @@ In this example, the input values `x_test` and the predicted values `y_pred` are
 
 ## Simulated FHE Inference in the clear
 
-One can perform the inference on clear data in order to evaluate the impact of quantization and of FHE computation on the accuracy of their model. See [this section](../deep-learning/fhe_assistant.md#simulation) for more details. Two approaches exist:
+You can perform the inference on clear data in order to evaluate the impact of quantization and of FHE computation on the accuracy of their model. See [this section](../deep-learning/fhe_assistant.md#simulation) for more details. Two approaches exist:
 
 - `quantized_module.forward(quantized_x, fhe="simulate")`: simulates FHE execution taking into account Table Lookup errors.\
   De-quantization must be done in a second step as for actual FHE execution. Simulation takes into account the `p_error`/`global_p_error` parameters

From 27b84c072b0863c5a800d39422a2ebb9e6c10330 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:55:33 +0200
Subject: [PATCH 07/17] fix: Update docs/deep-learning/fhe_assistant.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/fhe_assistant.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index 6a6fe1783..b6fa603f5 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -61,7 +61,7 @@ The most common compilation errors stem from the following causes:
 
 **Error message**: `this [N]-bit value is used as an input to a table lookup`
 
-**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16-bits. The most common approaches to fix this issue are:
+**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16 bits. 
 
 **Possible solutions**:
 

From cd67f9dabd74c61e234af4394a14534cba6331f2 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:55:43 +0200
Subject: [PATCH 08/17] fix: Update docs/deep-learning/fhe_assistant.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/fhe_assistant.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index b6fa603f5..a6ee832bc 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -71,7 +71,7 @@ The most common compilation errors stem from the following causes:
 
 #### 2. No crypto-parameters can be found
 
-**Error message**: `RuntimeError: NoParametersFound` is raised by the compiler
+**Error message**: `RuntimeError: NoParametersFound`
 
 **Cause**: This error occurs when using `rounding_threshold_bits` in the `compile_torch_model` function.
 

From b0c5c34d67eb7be592c5038764be3df12fa7d79a Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:56:05 +0200
Subject: [PATCH 09/17] fix: Update docs/deep-learning/fhe_assistant.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/fhe_assistant.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index a6ee832bc..8ef8a1a7a 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -81,7 +81,7 @@ The most common compilation errors stem from the following causes:
 
 **Error message**: `Error occurred during quantization aware training (QAT) import [...] Could not determine a unique scale for the quantization!`.
 
-**Cause**: This error is a due to missing quantization operators in the model that is imported as a quantized aware training model. See [this guide](../deep-learning/fhe_friendly_models.md) on how to use Brevitas layers. This error message is generated when not all layers take inputs that are quantized through `QuantIdentity` layers.
+**Cause**: This error occurs when the model imported as a quantized-aware training model lacks quantization operators. See [this guide](../deep-learning/fhe_friendly_models.md) on how to use Brevitas layers. This error message indicates that some layers do not take inputs quantized through `QuantIdentity` layers.
 
 A common example is related to the concatenation operator. Suppose two tensors `x` and `y` are produced by two layers and need to be concatenated:
 

From 7db02ea1035223e9f2e6f24321189f794681e3ac Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:56:17 +0200
Subject: [PATCH 10/17] fix: Update docs/deep-learning/torch_support.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index 383497a99..d91ef60c5 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -5,7 +5,7 @@ In addition to the built-in models, Concrete ML supports generic machine learnin
 There are two approaches to build [FHE-compatible deep networks](../getting-started/concepts.md#model-accuracy-considerations-under-fhe-constraints):
 
 - [Quantization Aware Training (QAT)](../explanations/quantization.md) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch. To use this mode, compile models using `compile_brevitas_qat_model`
-- Post-training Quantization: in this mode a vanilla PyTorch model can be compiled. However, when quantizing weights & activations to fewer than 7 bits the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with `compile_torch_model`.
+- **Post-training Quantization**: This mode allows a vanilla PyTorch model to be compiled. However, when quantizing weights & activations to fewer than 7 bits, the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with `compile_torch_model`.
 
 Both approaches should be used with the `rounding_threshold_bits` parameter set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.
 

From 4fe35a2e16dc63b63387036ab6143101e62663c1 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:56:27 +0200
Subject: [PATCH 11/17] fix: Update docs/deep-learning/torch_support.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index d91ef60c5..36e6ab3fc 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -7,7 +7,7 @@ There are two approaches to build [FHE-compatible deep networks](../getting-star
 - [Quantization Aware Training (QAT)](../explanations/quantization.md) requires using custom layers, but can quantize weights and activations to low bit-widths. Concrete ML works with [Brevitas](../explanations/inner-workings/external_libraries.md#brevitas), a library providing QAT support for PyTorch. To use this mode, compile models using `compile_brevitas_qat_model`
 - **Post-training Quantization**: This mode allows a vanilla PyTorch model to be compiled. However, when quantizing weights & activations to fewer than 7 bits, the accuracy can decrease strongly. On the other hand, depending on the model size, quantizing with 6-8 bits can be incompatible with FHE constraints. To use this mode, compile models with `compile_torch_model`.
 
-Both approaches should be used with the `rounding_threshold_bits` parameter set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.
+Both approaches require the `rounding_threshold_bits` parameter to be set accordingly. The best values for this parameter need to be determined through experimentation. A good initial value to try is `6`. See [here](../explanations/advanced_features.md#rounded-activations-and-quantizers) for more details.
 
 {% hint style="info" %}
 **See the [common compilation errors page](./fhe_assistant.md#common-compilation-errors) for an explanation of some error messages that the compilation function may raise.**

From 8ab41e388cfdf577d243044cf9cb374df2e81247 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:56:50 +0200
Subject: [PATCH 12/17] fix: Update docs/deep-learning/torch_support.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index 36e6ab3fc..6e3d16ef1 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -107,7 +107,7 @@ quantized_module = compile_torch_model(
 
 ## Configuring quantization parameters
 
-With QAT, the PyTorch/Brevitas models, created following the example above, require the user to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. When using this mode, set `n_bits=None` in the `compile_brevitas_qat_model`.
+With QAT (the PyTorch/Brevitas models created following the example above), you need to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. When using this mode, set `n_bits=None` in the `compile_brevitas_qat_model`.
 
 With PTQ, the user needs to set the `n_bits` value in the `compile_torch_model` function. A trade-off between accuracy and FHE compatibility and latency must be determined manually.
 

From feff4f606573847ffcd419f83c1030f1f91dc006 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:57:02 +0200
Subject: [PATCH 13/17] fix: Update docs/deep-learning/torch_support.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index 6e3d16ef1..8c6497715 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -109,7 +109,7 @@ quantized_module = compile_torch_model(
 
 With QAT (the PyTorch/Brevitas models created following the example above), you need to configure quantization parameters such as `bit_width` (activation bit-width) and `weight_bit_width`. When using this mode, set `n_bits=None` in the `compile_brevitas_qat_model`.
 
-With PTQ, the user needs to set the `n_bits` value in the `compile_torch_model` function. A trade-off between accuracy and FHE compatibility and latency must be determined manually.
+With PTQ, you need to set the `n_bits` value in the `compile_torch_model` function and must manually determine the trade-off between accuracy, FHE compatibility, and latency.
 
 The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
 

From c70609beea1e561bdfe89a571ace60bd6bb6c443 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:57:13 +0200
Subject: [PATCH 14/17] fix: Update docs/deep-learning/torch_support.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index 8c6497715..15a195169 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -111,7 +111,7 @@ With QAT (the PyTorch/Brevitas models created following the example above), you
 
 With PTQ, you need to set the `n_bits` value in the `compile_torch_model` function and must manually determine the trade-off between accuracy, FHE compatibility, and latency.
 
-The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
+The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit width of the network. Larger accumulator bit widths result in higher accuracy but slower FHE inference time.
 
 ## Running encrypted inference
 

From 6746d602dac01830e42c774bd0697dd4e9d9bc80 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <95410270+andrei-stoian-zama@users.noreply.github.com>
Date: Mon, 17 Jun 2024 13:57:28 +0200
Subject: [PATCH 15/17] fix: Update docs/deep-learning/fhe_assistant.md

Co-authored-by: yuxizama <157474013+yuxizama@users.noreply.github.com>
---
 docs/deep-learning/fhe_assistant.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index 8ef8a1a7a..382c69199 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -55,7 +55,6 @@ concrete_clf.compile(X, debug_config)
 
 ## Common compilation errors
 
-The most common compilation errors stem from the following causes:
 
 #### 1. TLU input maximum bit-width is exceeded
 

From a33d063866d5f2b1808de28418d9607904a241d2 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Mon, 17 Jun 2024 14:35:22 +0200
Subject: [PATCH 16/17] fix: formatting

---
 docs/deep-learning/fhe_assistant.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md
index 382c69199..63bba03ec 100644
--- a/docs/deep-learning/fhe_assistant.md
+++ b/docs/deep-learning/fhe_assistant.md
@@ -55,12 +55,11 @@ concrete_clf.compile(X, debug_config)
 
 ## Common compilation errors
 
-
 #### 1. TLU input maximum bit-width is exceeded
 
 **Error message**: `this [N]-bit value is used as an input to a table lookup`
 
-**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16 bits. 
+**Cause**: This error can occur when `rounding_threshold_bits` is not used and accumulated intermediate values in the computation exceed 16 bits.
 
 **Possible solutions**:
 

From 1d3fbd6b94efdcdbcdb8893d16900339213d20a0 Mon Sep 17 00:00:00 2001
From: Andrei Stoian <andrei.stoian@zama.ai>
Date: Mon, 17 Jun 2024 15:25:43 +0200
Subject: [PATCH 17/17] fix: bitwidth

---
 docs/deep-learning/torch_support.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md
index 15a195169..8c6497715 100644
--- a/docs/deep-learning/torch_support.md
+++ b/docs/deep-learning/torch_support.md
@@ -111,7 +111,7 @@ With QAT (the PyTorch/Brevitas models created following the example above), you
 
 With PTQ, you need to set the `n_bits` value in the `compile_torch_model` function and must manually determine the trade-off between accuracy, FHE compatibility, and latency.
 
-The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit width of the network. Larger accumulator bit widths result in higher accuracy but slower FHE inference time.
+The quantization parameters, along with the number of neurons on each layer, will determine the accumulator bit-width of the network. Larger accumulator bit-widths result in higher accuracy but slower FHE inference time.
 
 ## Running encrypted inference