Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add explanation of encrypted training and federated learning #437

Merged
merged 10 commits into from
Jan 18, 2024

Conversation

andrei-stoian-zama
Copy link
Collaborator

@andrei-stoian-zama andrei-stoian-zama commented Jan 10, 2024

Doc updates for CML 1.4.0:

  • mention training on encrypted data
  • mention FL
  • add a page on encrypted training
  • improve the Logistic Reg encrypted training notebook

Closes https://github.com/zama-ai/concrete-ml-internal/issues/4049
Closes https://github.com/zama-ai/concrete-ml-internal/issues/4154
Closes https://github.com/zama-ai/concrete-ml-internal/issues/4153
Closes https://github.com/zama-ai/concrete-ml-internal/issues/4036

@andrei-stoian-zama andrei-stoian-zama requested a review from a team as a code owner January 10, 2024 12:20
@cla-bot cla-bot bot added the cla-signed label Jan 10, 2024
Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks ! I have very few comments

also, if you want the associated issue to be closes you need to write closes https://github.com/zama-ai/concrete-ml-internal/issues/4049 (complete link)

Copy link
Collaborator

@jfrery jfrery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Just a few comments

```

{% hint style="info" %}
Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured through _differential privacy_ instead of encryption. Concrete ML
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Federated learning is an alternative approach, where data privacy can be ensured through differential privacy instead of encryption.

A bit confusing to me as we mix different technologies here. FL is a solution as is to privacy why are we mentioning DP?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should talk about it. People care about PPML not specific solutions, if we can show how FHE and FL/DP are complementary then we should.

I mention DP because that's what brings the privacy to FL. Do you not agree?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mention DP because that's what brings the privacy to FL

FL is a privacy training tech as is. Not sure how DP is is related here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, good point

docs/built-in-models/training.md Outdated Show resolved Hide resolved
Comment on lines 45 to 46
The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model
is proportional to the number of features and the number of training examples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model
is proportional to the number of features and the number of training examples.
The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the execution time of a single iteration
is proportional to the number of features and the number of training samples in the batch.

Copy link
Collaborator

@bcm-at-zama bcm-at-zama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the different changes, in the doc and in the nb

What about adding also a short sentence / entry in https://github.com/zama-ai/concrete-ml#online-demos-and-tutorials for FL example and for FHE training? Really to me, the #online-demos-and-tutorials is an excellent entry point for our demos / showskills, so I would try to have it up to date with our best capabilities.

Possibly, if you feel there are too many things in #online-demos-and-tutorials, you could reduce it a bit, but I would keep the best things there

docs/README.md Outdated
@@ -86,11 +89,11 @@ This example shows the typical flow of a Concrete ML model:

To make a model work with FHE, the only constraint is to make it run within the supported precision limitations of Concrete ML (currently 16-bit integers). Thus, machine learning models must be quantized, which sometimes leads to a loss of accuracy versus the original model, which operates on plaintext.

Additionally, Concrete ML currently only supports FHE _inference_. Training has to be done on unencrypted data, producing a model which is then converted to an FHE equivalent that can perform encrypted inference (i.e., prediction over encrypted data).
Additionally, Concrete ML currently only supports training on encrypted data for some models, while it supports _inference_ for a large variety of models.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a double space before inference

docs/built-in-models/training.md Outdated Show resolved Hide resolved
@@ -154,6 +154,8 @@ Various tutorials are given for [built-in models](docs/built-in-models/ml_exampl
- [Health diagnosis](use_case_examples/disease_prediction/): based on a patient's symptoms, history and other health factors, give
a diagnosis using FHE to preserve the privacy of the patient.

- [Private inference for federated learned models](use_case_examples/federated_learning/): private training of a Logistic Regression model and then import the model into Concrete ML and perform encrypted prediction
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bcm-at-zama added this as you suggested


## Training configuration

The `max_iter` parameter controls the number of batches that are processed by the training algorithm.
Copy link
Collaborator

@RomanBredehoft RomanBredehoft Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a parameter from sklearn, I think we can remove it no ? Unless there's a good reason to set it for CML and in that case we should tell it here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave it as it is important to set it properly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe emphasize on that then ! saying that's it's important for whatever reasons

Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few changes remaining sorry !

RomanBredehoft
RomanBredehoft previously approved these changes Jan 18, 2024
Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many thanks ! I would maybe emphasize on the max_iter parameter but other than that fine with me


{% hint style="info" %}
Concrete ML has a _simulation_ mode where the impact of approximate computation of TLUs on the model accuracy can be determined. The simulation is much faster, speeding up model development significantly. The behavior in simulation mode is representative of the behavior of the model on encrypted data.
{% endhint %}

In Concrete ML, there are three different ways to define the error probability:
In Concrete ML, there are three different ways to define the tolerance to off-by-one errors for each TLU operation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we comment on the CRT representation around here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no.. way to complicated to explain


The `max_iter` parameter controls the number of batches that are processed by the training algorithm.

The `parameters_range` parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range $$[-1, 1]$$.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add that this isn't just for initialization but also for quantization?

fd0r
fd0r previously approved these changes Jan 18, 2024
Copy link
Collaborator

@fd0r fd0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link

⚠️ Known flaky tests have been re-run ⚠️

One or several tests initially failed but were detected as known flaky tests. They therefore have been re-run and passed. See below for more details.

Failed tests details

Known flaky tests that initially failed:

  • tests/quantization/test_compilation.py::test_quantized_module_compilation[False-False-2-ReLU6-1-FCSeqAddBiasVec]

Copy link

Coverage failed ❌

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
---------------------- coverage: failed workers ----------------------
The following workers failed to return coverage data, ensure that pytest-cov is installed on these workers.
gw1
Name                                                  Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------------------
src/concrete/ml/onnx/onnx_model_manipulations.py        101      2    98%   45, 48
src/concrete/ml/pytest/torch_models.py                  600      8    99%   1328-1331, 1342-1345
src/concrete/ml/pytest/utils.py                         154      9    94%   501-510, 551, 553
src/concrete/ml/quantization/quantized_module.py        223      3    99%   214, 235, 250
src/concrete/ml/search_parameters/p_error_search.py     122     96    21%   106-146, 209-228, 233-239, 245-246, 255-266, 270-271, 291-300, 332-363, 374-380, 425-514
-----------------------------------------------------------------------------------
TOTAL                                                  6963    118    98%

47 files skipped due to complete coverage.

@@ -4,10 +4,13 @@

![](.gitbook/assets/3.png)

Concrete ML is an open source, privacy-preserving, machine learning inference framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)).
Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)). Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data. is grammatically correct, however, the use of "but" might be slightly misleading, as the two features (model conversion for inference and model training on encrypted data) are not in opposition but are complementary

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest:
Concrete ML not only supports converting models for inference with FHE, it also enables training somes models ...

@andrei-stoian-zama andrei-stoian-zama merged commit 57dbdff into main Jan 18, 2024
10 checks passed
@andrei-stoian-zama andrei-stoian-zama deleted the docs/update_cml140 branch January 18, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants