Skip to content

Commit

Permalink
docs: add explanation of encrypted training and federated learning
Browse files Browse the repository at this point in the history
Closes #4049
  • Loading branch information
andrei-stoian-zama committed Jan 9, 2024
1 parent 5dfbe3d commit f0871d4
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 5 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,11 @@
</p>
<hr>

Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of [Concrete](https://github.com/zama-ai/concrete) by [Zama](https://github.com/zama-ai). It aims to simplify the use of fully homomorphic encryption (FHE) for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. Concrete ML was designed with ease-of-use in mind, so that data scientists can use it without knowledge of cryptography. Notably, the Concrete ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.
Concrete ML is a Privacy-Preserving Machine Learning (PPML) open-source set of tools built on top of [Concrete](https://github.com/zama-ai/concrete) by [Zama](https://github.com/zama-ai). It simplifies the use of fully homomorphic encryption (FHE) for data scientists to help them automatically turn machine learning models into their homomorphic equivalent. Concrete ML was designed with ease-of-use in mind, so that data scientists can use it without knowledge of cryptography. Notably, the Concrete ML model classes are similar to those in scikit-learn and it is also possible to convert PyTorch models to FHE.

## Main features.

Data scientists can use models with APIs which are close to the frameworks they use, with additional options to run inferences in FHE.
Data scientists can use models with APIs which are close to the frameworks they use, while additional options to those models allow them to run inference or training on encrypted data with FHE.

Concrete ML features:

Expand Down
9 changes: 6 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@

![](.gitbook/assets/3.png)

Concrete ML is an open source, privacy-preserving, machine learning inference framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)).
Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)). Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data.

Fully Homomorphic Encryption is an encryption technique that allows computing directly on encrypted data, without needing to decrypt it. With FHE, you can build private-by-design applications without compromising on features. You can learn more about FHE in [this introduction](https://www.zama.ai/post/tfhe-deep-dive-part-1) or by joining the [FHE.org](https://fhe.org) community.

Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured through _differential privacy_ instead of encryption. Concrete ML
can import linear models, including logistic regression, that are trained using federated learning using the [`from_sklearn` function](./built-in-models/linear.md#pre-trained-models).

## Example usage

Here is a simple example of classification on encrypted data using logistic regression. More examples can be found [here](built-in-models/ml_examples.md).
Expand Down Expand Up @@ -86,11 +89,11 @@ This example shows the typical flow of a Concrete ML model:

To make a model work with FHE, the only constraint is to make it run within the supported precision limitations of Concrete ML (currently 16-bit integers). Thus, machine learning models must be quantized, which sometimes leads to a loss of accuracy versus the original model, which operates on plaintext.

Additionally, Concrete ML currently only supports FHE _inference_. Training has to be done on unencrypted data, producing a model which is then converted to an FHE equivalent that can perform encrypted inference (i.e., prediction over encrypted data).
Additionally, Concrete ML currently only supports training on encrypted data for some models, while it supports _inference_ for a large variety of models.

Finally, there is currently no support for pre-processing model inputs and post-processing model outputs. These processing stages may involve text-to-numerical feature transformation, dimensionality reduction, KNN or clustering, featurization, normalization, and the mixing of results of ensemble models.

These issues are currently being addressed, and significant improvements are expected to be released in the coming months.
These issues are currently being addressed, and significant improvements are expected to be released in the near future.

## Concrete stack

Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
- [Neural Networks](built-in-models/neural-networks.md)
- [Nearest Neighbors](built-in-models/nearest-neighbors.md)
- [Pandas](built-in-models/pandas.md)
- [Encrypted training](built-in-models/training.md)
- [Built-in Model Examples](built-in-models/ml_examples.md)

## Deep Learning
Expand Down
2 changes: 2 additions & 0 deletions docs/built-in-models/linear.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ Using these models in FHE is extremely similar to what can be done with scikit-l

Models are also compatible with some of scikit-learn's main workflows, such as `Pipeline()` and `GridSearch()`.

## Pre-trained models

It is possible to convert an already trained scikit-learn linear model to a Concrete ML one by using the [`from_sklearn_model`](../developer-guide/api/concrete.ml.sklearn.base.md#classmethod-from_sklearn_model) method. See [below for an example](#loading-a-pre-trained-model). This functionality is only available for linear models.

## Quantization parameters
Expand Down
48 changes: 48 additions & 0 deletions docs/built-in-models/training.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Training on Encrypted Data

Concrete ML offers the possibility to train [SGD Logistic Regression](../developer-guide/api/concrete.ml.sklearn.linear_model.md#class-sgdclassifier) on encrypted data. The [logistic regression training](../advanced_examples/LogisticRegressionTraining.ipynb) example shows this feature in action.

This example shows how to instantiate a logistic regression model that trains on encrypted data:

```python
parameters_range = (-1.0, 1.0)

sgd_clf_binary_simulate = SGDClassifier(
random_state=RANDOM_STATE,
max_iter=N_ITERATIONS,
fit_encrypted=True,
parameters_range=parameters_range,
)
```

To activate encrypted training simply set `fit_encrypted=True` in the constructor. If this value is not set, training is performed
on clear data using `scikit-learn` gradient descent.

Next, to perform the training on encrypted data, call the `fit` function with the `fhe="execute"` argument:

<!--pytest-codeblocks:skip-->

```python
sgd_clf_binary_fhe.fit(X_binary, y_binary, fhe="execute")
```

{% hint style="info" %}
Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured through _differential privacy_ instead of encryption. Concrete ML
can import linear models, including logistic regression, that are trained using federated learning using the [`from_sklearn` function](linear.md#pre-trained-models).

{% endhint %}

## Training configuration

The `max_iter` parameter controls the number of batches that are processed by the training algorithm. Good values for this parameter are 8-64.

The `parameters_range` parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range $$[-1, 1]$$.

## Capabilities and Limitations

The logistic model that can be trained uses Stochastic Gradient Descent (SGD) and quantizes for data, weights, gradients and the error measure. It currently supports training 6-bit models, training both the coefficients and the bias.

The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model
is proportional to the number of features and the number of training examples.

The `SGDClassifier` training does not currently support client/server deployment for training.
1 change: 1 addition & 0 deletions docs/index.toc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
built-in-models/nearest-neighbors.md
built-in-models/pandas.md
built-in-models/ml_examples.md
built-in-models/training.md

.. toctree::
:maxdepth: 0
Expand Down

0 comments on commit f0871d4

Please sign in to comment.