From dcece758ee7d9dc87ce0a14f235153d5c5d28fa1 Mon Sep 17 00:00:00 2001 From: Andrei Stoian Date: Tue, 16 Jan 2024 18:10:44 +0100 Subject: [PATCH] docs: improve for release --- docs/advanced-topics/advanced_features.md | 18 ++++---- docs/built-in-models/training.md | 12 ++--- docs/deep-learning/fhe_assistant.md | 54 ++++++++++------------ docs/deep-learning/optimizing_inference.md | 4 +- docs/deep-learning/torch_support.md | 5 ++ 5 files changed, 45 insertions(+), 48 deletions(-) diff --git a/docs/advanced-topics/advanced_features.md b/docs/advanced-topics/advanced_features.md index cc9fccbe7..ab8fdd837 100644 --- a/docs/advanced-topics/advanced_features.md +++ b/docs/advanced-topics/advanced_features.md @@ -6,25 +6,25 @@ Concrete ML provides features for advanced users to adjust cryptographic paramet Concrete ML makes use of table lookups (TLUs) to represent any non-linear operation (e.g., a sigmoid). TLUs are implemented through the Programmable Bootstrapping (PBS) operation, which applies a non-linear operation in the cryptographic realm. -The result of TLU operations is obtained with a specific error probability. Concrete ML offers the possibility to set this error probability, which influences the cryptographic parameters. The higher the success rate, the more restrictive the parameters become. This can affect both key generation and, more significantly, FHE execution time. +The result of TLU operations is obtained with a specific tolerance to off-by-one errors. Concrete ML offers the possibility to set the probability of such errors occurring, which influences the cryptographic parameters. The lower the tolerance, the more restrictive the parameters become, making both key generation and, more significantly, FHE execution time slower. {% hint style="info" %} Concrete ML has a _simulation_ mode where the impact of approximate computation of TLUs on the model accuracy can be determined. The simulation is much faster, speeding up model development significantly. The behavior in simulation mode is representative of the behavior of the model on encrypted data. {% endhint %} -In Concrete ML, there are three different ways to define the error probability: +In Concrete ML, there are three different ways to define the tolerance to off-by-one errors for each TLU operation: - setting `p_error`, the error probability of an individual TLU (see [here](advanced_features.md#an-error-probability-for-an-individual-tlu)) - setting `global_p_error`, the error probability of the full circuit (see [here](advanced_features.md#a-global-error-probability-for-the-entire-model)) - not setting `p_error` nor `global_p_error`, and using default parameters (see [here](advanced_features.md#using-default-error-probability)) {% hint style="warning" %} -`p_error` and `global_p_error` are somehow two concurrent parameters, in the sense they both have an impact on the choice of cryptographic parameters. It is forbidden in Concrete ML to set both `p_error` and `global_p_error` simultaneously. +`p_error` and `global_p_error` cannot be set at the same time, as they are incompatible with each other. {% endhint %} -### An error probability for an individual TLU +### Tolerance to off-by-one error for an individual TLU -The first way to set error probabilities in Concrete ML is at the local level, by directly setting the probability of error of each individual TLU. This probability is referred to as `p_error`. A given PBS operation has a `1 - p_error` chance of being successful. The successful evaluation here means that the value decrypted after FHE evaluation is exactly the same as the one that would be computed in the clear. +The first way to set error probabilities in Concrete ML is at the local level, by directly setting the tolerance to error of each individual TLU operation (such as activation functions for a neuron output). This tolerance is referred to as `p_error`. A given PBS operation has a `1 - p_error` chance of being correct 100% of the time. The successful evaluation here means that the value decrypted after FHE evaluation is exactly the same as the one that would be computed in the clear. Otherwise, off-by-one errors might occur, which may not actually reduce model accuracy in practice. For simplicity, it is best to use [default options](advanced_features.md#using-default-error-probability), irrespective of the type of model. Especially for deep neural networks, default values may be too pessimistic, reducing computation speed without any improvement in accuracy. For deep neural networks, some TLU errors might not affect the accuracy of the network, so `p_error` can be safely increased (e.g., see CIFAR classifications in [our showcase](../getting-started/showcase.md)). @@ -63,9 +63,9 @@ clf.compile(X_train, p_error=0.1) If the `p_error` value is specified and [simulation](compilation.md#fhe-simulation) is enabled, the run will take into account the randomness induced by the choice of `p_error`. This results in statistical similarity to the FHE evaluation. -### A global error probability for the entire model +### A global tolerance for one-off-errors for the entire model -A `global_p_error` is also available and defines the probability of success for the entire model. Here, the `p_error` for every PBS is computed internally in Concrete such that the `global_p_error` is reached. +A `global_p_error` is also available and defines the probability of 100% correctness for the entire model, compared to execution in the clear. In this case, the `p_error` for every TLU is determined internally in Concrete such that the `global_p_error` is reached for the whole model. There might be cases where the user encounters a `No cryptography parameter found` error message. Increasing the `p_error` or the `global_p_error` in this case might help. @@ -78,7 +78,7 @@ Usage is similar to the `p_error` parameter: clf.compile(X_train, global_p_error=0.1) ``` -In the above example, XGBoostClassifier in FHE has a 1/10 probability to have a shifted output value compared to the expected value. The shift is relative to the expected value, so even if the result is different, it should be **around** the expected value. +In the above example, XGBoostClassifier in FHE has a 1/10 probability to have a one-off output value compared to the expected value. The shift is relative to the expected value, so even if the result is different, it should be **close** to the expected value. ### Using default error probability @@ -162,7 +162,7 @@ $$t = L - P$$ Then, the rounding operation can be computed as: -$$ \mathrm{round\_to\_t\_bits}(x, t) = \left\lfloor \frac{x}{2^t} \right\rceil \cdot 2^t $$ +$$ \mathrm{round\_to\_P\_bits}(x, t) = \left\lfloor \frac{x}{2^t} \right\rceil \cdot 2^t $$ where $$x$$ is the input number, and $$\lfloor \cdot \rceil$$ denotes the operation that rounds to the nearest integer. diff --git a/docs/built-in-models/training.md b/docs/built-in-models/training.md index 92d21ee81..57aee642e 100644 --- a/docs/built-in-models/training.md +++ b/docs/built-in-models/training.md @@ -7,7 +7,7 @@ This example shows how to instantiate a logistic regression model that trains on ```python parameters_range = (-1.0, 1.0) -sgd_clf_binary_simulate = SGDClassifier( +model = SGDClassifier( random_state=RANDOM_STATE, max_iter=N_ITERATIONS, fit_encrypted=True, @@ -23,7 +23,7 @@ Next, to perform the training on encrypted data, call the `fit` function with th ```python -sgd_clf_binary_fhe.fit(X_binary, y_binary, fhe="execute") +model.fit(X_binary, y_binary, fhe="execute") ``` {% hint style="info" %} @@ -34,7 +34,7 @@ can import linear models, including logistic regression, that are trained using ## Training configuration -The `max_iter` parameter controls the number of batches that are processed by the training algorithm. Good values for this parameter are 8-64. +The `max_iter` parameter controls the number of batches that are processed by the training algorithm. The `parameters_range` parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range $$[-1, 1]$$. @@ -42,7 +42,5 @@ The `parameters_range` parameter determines the initialization of the coefficien The logistic model that can be trained uses Stochastic Gradient Descent (SGD) and quantizes for data, weights, gradients and the error measure. It currently supports training 6-bit models, training both the coefficients and the bias. -The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model -is proportional to the number of features and the number of training examples. - -The `SGDClassifier` training does not currently support client/server deployment for training. +The `SGDClassifier` does not currently support training models with other values for the bit-widths. The time to train the model +is proportional to the number of features and the number of training examples. The `SGDClassifier` training does not currently support client/server deployment for training. diff --git a/docs/deep-learning/fhe_assistant.md b/docs/deep-learning/fhe_assistant.md index 9f44584a5..4f306605b 100644 --- a/docs/deep-learning/fhe_assistant.md +++ b/docs/deep-learning/fhe_assistant.md @@ -55,7 +55,11 @@ concrete_clf.fit(X, y) concrete_clf.compile(X, debug_config) ``` -## Compilation debugging +## Compilation error debugging + +Compilation errors that signal that the ML model is not FHE compatible are usually of two types: +1. TLU input maximum bit-width is exceeded +2. No crypto-parameters can be found for the ML model: `RuntimeError: NoParametersFound` is raised by the compiler The following produces a neural network that is not FHE-compatible: @@ -100,46 +104,36 @@ except RuntimeError as err: Upon execution, the Compiler will raise the following error within the graph representation: ``` -Function you are trying to compile cannot be converted to MLIR: - -%0 = _onnx__Gemm_0 # EncryptedTensor ∈ [-64, 63] -%1 = [[ 33 -27 ... 22 -29]] # ClearTensor ∈ [-63, 62] -%2 = matmul(%0, %1) # EncryptedTensor ∈ [-4973, 4828] -%3 = subgraph(%2) # EncryptedTensor ∈ [0, 126] -%4 = [[ 16 6 ... 10 54]] # ClearTensor ∈ [-63, 63] -%5 = matmul(%3, %4) # EncryptedTensor ∈ [-45632, 43208] -%6 = subgraph(%5) # EncryptedTensor ∈ [0, 126] -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ table lookups are only supported on circuits with up to 16-bit integers -%7 = [[ -7 -52] ... [-12 62]] # ClearTensor ∈ [-63, 62] -%8 = matmul(%6, %7) # EncryptedTensor ∈ [-26971, 29843] -return %8 +Function you are trying to compile cannot be compiled: + +%0 = _x # EncryptedTensor ∈ [-64, 63] +%1 = [[ -9 18 ... 30 34]] # ClearTensor ∈ [-62, 63] @ /fc1/Gemm.matmul +%2 = matmul(%0, %1) # EncryptedTensor ∈ [-5834, 5770] @ /fc1/Gemm.matmul +%3 = subgraph(%2) # EncryptedTensor ∈ [0, 127] +%4 = [[-36 6 ... 27 -11]] # ClearTensor ∈ [-63, 63] @ /fc2/Gemm.matmul +%5 = matmul(%3, %4) # EncryptedTensor ∈ [-34666, 37702] @ /fc2/Gemm.matmul +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this 17-bit value is used as an input to a table lookup ``` -The error `table lookups are only supported on circuits with up to 16-bit integers` indicates that the 16-bit limit on the input of the Table Lookup operation has been exceeded. To pinpoint the model layer that causes the error, Concrete ML provides the [bitwidth_and_range_report](../developer-guide/api/concrete.ml.quantization.quantized_module.md#method-bitwidth_and_range_report) helper function. First, the model must be compiled so that it can be [simulated](#simulation). Then, calling the function on the module above returns the following: +The error `this 17-bit value is used as an input to a table lookup` indicates that the 16-bit limit on the input of the Table Lookup (TLU) operation has been exceeded. To pinpoint the model layer that causes the error, Concrete ML provides the [bitwidth_and_range_report](../developer-guide/api/concrete.ml.quantization.quantized_module.md#method-bitwidth_and_range_report) helper function. First, the model must be compiled so that it can be [simulated](#simulation). + +To make this network FHE-compatible one can apply several techniques: +1. use [rounded accumulators](../advanced-topics/advanced_features.md#rounded-activations-and-quantizers) by specifying the `rounding_threshold_bits` parameter. Please evaluate the accuracy of the model using simulation if you use this feature, as it may impact accuracy. Setting a value 2-bit higher than the quantization `n_bits` should be a good start. + + +```python +torch_model = SimpleNet(20) -``` quantized_numpy_module = compile_torch_model( torch_model, torch_input, n_bits=7, - use_virtual_lib=True + rounding_threshold_bits=8, ) - -res = quantized_numpy_module.bitwidth_and_range_report() -print(res) -``` - -``` -{ - '/fc1/Gemm': {'range': (-6180, 6840), 'bitwidth': 14}, - '/fc2/Gemm': {'range': (-45051, 43090), 'bitwidth': 17}, - '/fc3/Gemm': {'range': (-17351, 13868), 'bitwidth': 16} -} ``` -To make this network FHE-compatible one can reduce the bit-width of the second layer named `fc2`. To do this, a simple solution is to reduce the number of neurons, as it is proportional to the bit-width. -Reducing the number of neurons in this layer resolves the error and makes the network FHE-compatible: +2. reduce the accumulator bit-width of the second layer named `fc2`. To do this, a simple solution is to reduce the number of neurons, as it is proportional to the bit-width. diff --git a/docs/deep-learning/optimizing_inference.md b/docs/deep-learning/optimizing_inference.md index 93e02de34..98d030f9a 100644 --- a/docs/deep-learning/optimizing_inference.md +++ b/docs/deep-learning/optimizing_inference.md @@ -21,6 +21,6 @@ Reducing the bit-width of the inputs to the Table Lookup (TLU) operations is a m it is possible to leverage some properties of the fused activation and quantization functions expressed in the TLUs to further reduce the accumulator. This is achieved through the _rounded PBS_ feature as described in the [rounded activations and quantizers reference](../advanced-topics/advanced_features.md#rounded-activations-and-quantizers). Adjusting the rounding amount, relative to the initial accumulator size, can bring large improvements in latency while maintaining accuracy. -## TLU error probability adjustment +## TLU error tolerance adjustment -Finally, the TFHE scheme exposes a TLU error probability parameter that has an impact on crypto-system parameters that influence latency. A higher probability of TLU error results in faster computations but may reduce accuracy. One can think of the error of obtaining $$T[x]$$ as a Gaussian distribution centered on $$x$$: $$TLU[x]$$ is obtained with probability of `1 - p_error`, while $$T[x-1]$$, $$T[x+1]$$ are obtained with much lower probability, etc. In Deep NNs, these type of errors can be tolerated up to some point. See the [`p_error` documentation for details](../advanced-topics/advanced_features.md#approximate-computations) and more specifically the usage example of [the API for finding the best `p_error`](../advanced-topics/advanced_features.md#searching-for-the-best-error-probability). +Finally, the TFHE scheme exposes a TLU error tolerance parameter that has an impact on crypto-system parameters that influence latency. A higher tolerance of TLU off-by-one errors results in faster computations but may reduce accuracy. One can think of the error of obtaining $$T[x]$$ as a Gaussian distribution centered on $$x$$: $$TLU[x]$$ is obtained with probability of `1 - p_error`, while $$T[x-1]$$, $$T[x+1]$$ are obtained with much lower probability, etc. In Deep NNs, these type of errors can be tolerated up to some point. See the [`p_error` documentation for details](../advanced-topics/advanced_features.md#approximate-computations) and more specifically the usage example of [the API for finding the best `p_error`](../advanced-topics/advanced_features.md#searching-for-the-best-error-probability). diff --git a/docs/deep-learning/torch_support.md b/docs/deep-learning/torch_support.md index 732eceaae..dd2cefac2 100644 --- a/docs/deep-learning/torch_support.md +++ b/docs/deep-learning/torch_support.md @@ -6,6 +6,11 @@ As [Quantization Aware Training (QAT)](../advanced-topics/quantization.md) is th The following example uses a simple QAT PyTorch model that implements a fully connected neural network with two hidden layers. Due to its small size, making this model respect FHE constraints is relatively easy. + +{% hint style="info" %} +Converting neural networks to use FHE can be done with `compile_brevitas_qat_model` or with `compile_torch_model` for post-training quantization. If the model can not be converted to FHE two types of errors can be raised: (1) crypto-parameters can not be found and, (2) table look-up bit-width limit is exceeded. See the [debugging section](./fhe_assistant.md#compilation-error-debugging) if you encounter these errors. +{% endhint %} + ```python import brevitas.nn as qnn import torch.nn as nn