-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add explanation of encrypted training and federated learning #437
Changes from 2 commits
5da46ce
2e47b4a
dcece75
775f661
5094b30
f43c8b4
1e92a2e
dc793ca
65ee069
e576dc1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,10 +4,13 @@ | |
|
||
![](.gitbook/assets/3.png) | ||
|
||
Concrete ML is an open source, privacy-preserving, machine learning inference framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)). | ||
Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)). Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data. | ||
|
||
Fully Homomorphic Encryption is an encryption technique that allows computing directly on encrypted data, without needing to decrypt it. With FHE, you can build private-by-design applications without compromising on features. You can learn more about FHE in [this introduction](https://www.zama.ai/post/tfhe-deep-dive-part-1) or by joining the [FHE.org](https://fhe.org) community. | ||
|
||
Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured through _differential privacy_ instead of encryption. Concrete ML | ||
can import linear models, including logistic regression, that are trained using federated learning using the [`from_sklearn` function](./built-in-models/linear.md#pre-trained-models). | ||
|
||
## Example usage | ||
|
||
Here is a simple example of classification on encrypted data using logistic regression. More examples can be found [here](built-in-models/ml_examples.md). | ||
|
@@ -86,11 +89,11 @@ This example shows the typical flow of a Concrete ML model: | |
|
||
To make a model work with FHE, the only constraint is to make it run within the supported precision limitations of Concrete ML (currently 16-bit integers). Thus, machine learning models must be quantized, which sometimes leads to a loss of accuracy versus the original model, which operates on plaintext. | ||
|
||
Additionally, Concrete ML currently only supports FHE _inference_. Training has to be done on unencrypted data, producing a model which is then converted to an FHE equivalent that can perform encrypted inference (i.e., prediction over encrypted data). | ||
Additionally, Concrete ML currently only supports training on encrypted data for some models, while it supports _inference_ for a large variety of models. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there's a double space before inference |
||
|
||
Finally, there is currently no support for pre-processing model inputs and post-processing model outputs. These processing stages may involve text-to-numerical feature transformation, dimensionality reduction, KNN or clustering, featurization, normalization, and the mixing of results of ensemble models. | ||
|
||
These issues are currently being addressed, and significant improvements are expected to be released in the coming months. | ||
These issues are currently being addressed, and significant improvements are expected to be released in the near future. | ||
|
||
## Concrete stack | ||
|
||
|
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,48 @@ | ||||||||||
# Training on Encrypted Data | ||||||||||
|
||||||||||
Concrete ML offers the possibility to train [SGD Logistic Regression](../developer-guide/api/concrete.ml.sklearn.linear_model.md#class-sgdclassifier) on encrypted data. The [logistic regression training](../advanced_examples/LogisticRegressionTraining.ipynb) example shows this feature in action. | ||||||||||
|
||||||||||
This example shows how to instantiate a logistic regression model that trains on encrypted data: | ||||||||||
|
||||||||||
```python | ||||||||||
parameters_range = (-1.0, 1.0) | ||||||||||
|
||||||||||
sgd_clf_binary_simulate = SGDClassifier( | ||||||||||
RomanBredehoft marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
random_state=RANDOM_STATE, | ||||||||||
max_iter=N_ITERATIONS, | ||||||||||
fit_encrypted=True, | ||||||||||
parameters_range=parameters_range, | ||||||||||
) | ||||||||||
``` | ||||||||||
|
||||||||||
To activate encrypted training simply set `fit_encrypted=True` in the constructor. If this value is not set, training is performed | ||||||||||
on clear data using `scikit-learn` gradient descent. | ||||||||||
|
||||||||||
Next, to perform the training on encrypted data, call the `fit` function with the `fhe="execute"` argument: | ||||||||||
|
||||||||||
<!--pytest-codeblocks:skip--> | ||||||||||
|
||||||||||
```python | ||||||||||
sgd_clf_binary_fhe.fit(X_binary, y_binary, fhe="execute") | ||||||||||
RomanBredehoft marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
``` | ||||||||||
|
||||||||||
{% hint style="info" %} | ||||||||||
Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured through _differential privacy_ instead of encryption. Concrete ML | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
A bit confusing to me as we mix different technologies here. FL is a solution as is to privacy why are we mentioning DP? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should talk about it. People care about PPML not specific solutions, if we can show how FHE and FL/DP are complementary then we should. I mention DP because that's what brings the privacy to FL. Do you not agree? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
FL is a privacy training tech as is. Not sure how DP is is related here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, good point |
||||||||||
can import linear models, including logistic regression, that are trained using federated learning using the [`from_sklearn` function](linear.md#pre-trained-models). | ||||||||||
|
||||||||||
{% endhint %} | ||||||||||
|
||||||||||
## Training configuration | ||||||||||
|
||||||||||
The `max_iter` parameter controls the number of batches that are processed by the training algorithm. Good values for this parameter are 8-64. | ||||||||||
jfrery marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
|
||||||||||
The `parameters_range` parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range $$[-1, 1]$$. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should add that this isn't just for initialization but also for quantization? |
||||||||||
|
||||||||||
## Capabilities and Limitations | ||||||||||
|
||||||||||
The logistic model that can be trained uses Stochastic Gradient Descent (SGD) and quantizes for data, weights, gradients and the error measure. It currently supports training 6-bit models, training both the coefficients and the bias. | ||||||||||
|
||||||||||
The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model | ||||||||||
RomanBredehoft marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||
is proportional to the number of features and the number of training examples. | ||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
||||||||||
The `SGDClassifier` training does not currently support client/server deployment for training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence
Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data.
is grammatically correct, however, the use of "but" might be slightly misleading, as the two features (model conversion for inference and model training on encrypted data) are not in opposition but are complementaryThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest:
Concrete ML not only supports converting models for inference with FHE, it also enables training somes models ...