docs: add explanation of encrypted training and federated learning #437

andrei-stoian-zama · 2024-01-10T12:20:49Z

Doc updates for CML 1.4.0:

mention training on encrypted data
mention FL
add a page on encrypted training
improve the Logistic Reg encrypted training notebook

Closes https://github.com/zama-ai/concrete-ml-internal/issues/4049
Closes https://github.com/zama-ai/concrete-ml-internal/issues/4154
Closes https://github.com/zama-ai/concrete-ml-internal/issues/4153
Closes https://github.com/zama-ai/concrete-ml-internal/issues/4036

docs/built-in-models/training.md

RomanBredehoft

thanks ! I have very few comments

also, if you want the associated issue to be closes you need to write closes https://github.com/zama-ai/concrete-ml-internal/issues/4049 (complete link)

jfrery

Thanks a lot! Just a few comments

jfrery · 2024-01-10T13:59:59Z

docs/built-in-models/training.md

+```
+
+{% hint style="info" %}
+Training on encrypted data provides the highest level of privacy but is slower than training on clear data. Federated learning is an alternative approach, where data privacy can be ensured through _differential privacy_ instead of encryption. Concrete ML


Federated learning is an alternative approach, where data privacy can be ensured through differential privacy instead of encryption.

A bit confusing to me as we mix different technologies here. FL is a solution as is to privacy why are we mentioning DP?

I think we should talk about it. People care about PPML not specific solutions, if we can show how FHE and FL/DP are complementary then we should.

I mention DP because that's what brings the privacy to FL. Do you not agree?

I mention DP because that's what brings the privacy to FL

FL is a privacy training tech as is. Not sure how DP is is related here.

ok, good point

docs/built-in-models/training.md

jfrery · 2024-01-10T14:05:32Z

docs/built-in-models/training.md

+The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model
+is proportional to the number of features and the number of training examples.


Suggested change

The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model

is proportional to the number of features and the number of training examples.

The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the execution time of a single iteration

is proportional to the number of features and the number of training samples in the batch.

bcm-at-zama

Thanks for the different changes, in the doc and in the nb

What about adding also a short sentence / entry in https://github.com/zama-ai/concrete-ml#online-demos-and-tutorials for FL example and for FHE training? Really to me, the #online-demos-and-tutorials is an excellent entry point for our demos / showskills, so I would try to have it up to date with our best capabilities.

Possibly, if you feel there are too many things in #online-demos-and-tutorials, you could reduce it a bit, but I would keep the best things there

Closes #4049

bencrts · 2024-01-15T09:42:43Z

docs/README.md

@@ -86,11 +89,11 @@ This example shows the typical flow of a Concrete ML model:

 To make a model work with FHE, the only constraint is to make it run within the supported precision limitations of Concrete ML (currently 16-bit integers). Thus, machine learning models must be quantized, which sometimes leads to a loss of accuracy versus the original model, which operates on plaintext.

-Additionally, Concrete ML currently only supports FHE _inference_. Training has to be done on unencrypted data, producing a model which is then converted to an FHE equivalent that can perform encrypted inference (i.e., prediction over encrypted data).
+Additionally, Concrete ML currently only supports training on encrypted data for some models, while it supports  _inference_ for a large variety of models.


I think there's a double space before inference

docs/built-in-models/training.md

andrei-stoian-zama · 2024-01-17T09:54:08Z

README.md

@@ -154,6 +154,8 @@ Various tutorials are given for [built-in models](docs/built-in-models/ml_exampl
 - [Health diagnosis](use_case_examples/disease_prediction/): based on a patient's symptoms, history and other health factors, give
  a diagnosis using FHE to preserve the privacy of the patient.

+- [Private inference for federated learned models](use_case_examples/federated_learning/): private training of a Logistic Regression model and then import the model into Concrete ML and perform encrypted prediction


@bcm-at-zama added this as you suggested

docs/advanced-topics/advanced_features.md

RomanBredehoft · 2024-01-17T15:50:38Z

docs/built-in-models/training.md

+
+## Training configuration
+
+The `max_iter` parameter controls the number of batches that are processed by the training algorithm.


this is a parameter from sklearn, I think we can remove it no ? Unless there's a good reason to set it for CML and in that case we should tell it here

I'd leave it as it is important to set it properly.

I would maybe emphasize on that then ! saying that's it's important for whatever reasons

docs/deep-learning/fhe_assistant.md

RomanBredehoft

a few changes remaining sorry !

RomanBredehoft

many thanks ! I would maybe emphasize on the max_iter parameter but other than that fine with me

fd0r · 2024-01-18T09:35:15Z

docs/advanced-topics/advanced_features.md


 {% hint style="info" %}
 Concrete ML has a _simulation_ mode where the impact of approximate computation of TLUs on the model accuracy can be determined. The simulation is much faster, speeding up model development significantly. The behavior in simulation mode is representative of the behavior of the model on encrypted data.
 {% endhint %}

-In Concrete ML, there are three different ways to define the error probability:
+In Concrete ML, there are three different ways to define the tolerance to off-by-one errors for each TLU operation:


Should we comment on the CRT representation around here?

no.. way to complicated to explain

fd0r · 2024-01-18T09:38:16Z

docs/built-in-models/training.md

+
+The `max_iter` parameter controls the number of batches that are processed by the training algorithm.
+
+The `parameters_range` parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range $$[-1, 1]$$.


Maybe we should add that this isn't just for initialization but also for quantization?

fd0r

lgtm

github-actions · 2024-01-18T14:41:45Z

⚠️ Known flaky tests have been re-run ⚠️

One or several tests initially failed but were detected as known flaky tests. They therefore have been re-run and passed. See below for more details.

Failed tests details

Known flaky tests that initially failed:

tests/quantization/test_compilation.py::test_quantized_module_compilation[False-False-2-ReLU6-1-FCSeqAddBiasVec]

github-actions · 2024-01-18T14:41:48Z

Coverage failed ❌

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
---------------------- coverage: failed workers ----------------------
The following workers failed to return coverage data, ensure that pytest-cov is installed on these workers.
gw1
Name                                                  Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------------------
src/concrete/ml/onnx/onnx_model_manipulations.py        101      2    98%   45, 48
src/concrete/ml/pytest/torch_models.py                  600      8    99%   1328-1331, 1342-1345
src/concrete/ml/pytest/utils.py                         154      9    94%   501-510, 551, 553
src/concrete/ml/quantization/quantized_module.py        223      3    99%   214, 235, 250
src/concrete/ml/search_parameters/p_error_search.py     122     96    21%   106-146, 209-228, 233-239, 245-246, 255-266, 270-271, 291-300, 332-363, 374-380, 425-514
-----------------------------------------------------------------------------------
TOTAL                                                  6963    118    98%

47 files skipped due to complete coverage.

kcelia · 2024-01-18T15:23:34Z

docs/README.md

@@ -4,10 +4,13 @@

 ![](.gitbook/assets/3.png)

-Concrete ML is an open source, privacy-preserving, machine learning inference framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)).
+Concrete ML is an open source, privacy-preserving, machine learning framework based on Fully Homomorphic Encryption (FHE). It enables data scientists without any prior knowledge of cryptography to automatically turn machine learning models into their FHE equivalent, using familiar APIs from scikit-learn and PyTorch (see how it looks for [linear models](built-in-models/linear.md), [tree-based models](built-in-models/tree.md), and [neural networks](built-in-models/neural-networks.md)). Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data.


The sentence Concrete ML supports converting models for inference with FHE but can also [train some models](built-in-models/training.md) on encrypted data. is grammatically correct, however, the use of "but" might be slightly misleading, as the two features (model conversion for inference and model training on encrypted data) are not in opposition but are complementary

I would suggest:
Concrete ML not only supports converting models for inference with FHE, it also enables training somes models ...

andrei-stoian-zama requested a review from a team as a code owner January 10, 2024 12:20

cla-bot bot added the cla-signed label Jan 10, 2024

RomanBredehoft reviewed Jan 10, 2024

View reviewed changes

docs/built-in-models/training.md Outdated Show resolved Hide resolved

RomanBredehoft reviewed Jan 10, 2024

View reviewed changes

docs/built-in-models/training.md Outdated Show resolved Hide resolved

RomanBredehoft reviewed Jan 10, 2024

View reviewed changes

docs/built-in-models/training.md Outdated Show resolved Hide resolved

RomanBredehoft requested changes Jan 10, 2024

View reviewed changes

jfrery requested changes Jan 10, 2024

View reviewed changes

andrei-stoian-zama requested review from bencrts and bcm-at-zama January 12, 2024 13:12

bcm-at-zama reviewed Jan 12, 2024

View reviewed changes

andrei-stoian-zama added 2 commits January 12, 2024 16:51

docs: add explanation of encrypted training and federated learning

5da46ce

Closes #4049

chore: improve notebook for training, less verbose

2e47b4a

andrei-stoian-zama force-pushed the docs/update_cml140 branch from 10419c2 to 2e47b4a Compare January 12, 2024 15:54

bencrts reviewed Jan 15, 2024

View reviewed changes

andrei-stoian-zama added 2 commits January 16, 2024 18:10

docs: improve for release

dcece75

chore: address review

775f661

andrei-stoian-zama requested review from jfrery and RomanBredehoft January 17, 2024 09:53

andrei-stoian-zama commented Jan 17, 2024

View reviewed changes

andrei-stoian-zama added 3 commits January 17, 2024 13:22

chore: fix pcc

5094b30

chore: fix nb

f43c8b4

chore: fix codeblock

1e92a2e

RomanBredehoft reviewed Jan 17, 2024

View reviewed changes

docs/advanced-topics/advanced_features.md Outdated Show resolved Hide resolved

RomanBredehoft reviewed Jan 17, 2024

View reviewed changes

docs/advanced-topics/advanced_features.md Outdated Show resolved Hide resolved

RomanBredehoft reviewed Jan 17, 2024

View reviewed changes

docs/deep-learning/fhe_assistant.md Show resolved Hide resolved

RomanBredehoft reviewed Jan 17, 2024

View reviewed changes

docs/deep-learning/fhe_assistant.md Show resolved Hide resolved

RomanBredehoft requested changes Jan 17, 2024

View reviewed changes

chore: review

dc793ca

andrei-stoian-zama requested a review from RomanBredehoft January 18, 2024 08:57

RomanBredehoft previously approved these changes Jan 18, 2024

View reviewed changes

fd0r reviewed Jan 18, 2024

View reviewed changes

fd0r previously approved these changes Jan 18, 2024

View reviewed changes

fix: test

65ee069

andrei-stoian-zama dismissed stale reviews from fd0r and RomanBredehoft via 65ee069 January 18, 2024 13:20

fix: conformance

e576dc1

RomanBredehoft approved these changes Jan 18, 2024

View reviewed changes

kcelia reviewed Jan 18, 2024

View reviewed changes

jfrery approved these changes Jan 18, 2024

View reviewed changes

andrei-stoian-zama merged commit 57dbdff into main Jan 18, 2024
10 checks passed

andrei-stoian-zama deleted the docs/update_cml140 branch January 18, 2024 15:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add explanation of encrypted training and federated learning #437

docs: add explanation of encrypted training and federated learning #437

andrei-stoian-zama commented Jan 10, 2024 •

edited

Loading

RomanBredehoft left a comment

jfrery left a comment

jfrery Jan 10, 2024

andrei-stoian-zama Jan 12, 2024

jfrery Jan 12, 2024

andrei-stoian-zama Jan 16, 2024

jfrery Jan 10, 2024

bcm-at-zama left a comment

bencrts Jan 15, 2024

andrei-stoian-zama Jan 17, 2024

RomanBredehoft Jan 17, 2024 •

edited

Loading

andrei-stoian-zama Jan 18, 2024

RomanBredehoft Jan 18, 2024

RomanBredehoft left a comment

RomanBredehoft left a comment

fd0r Jan 18, 2024

andrei-stoian-zama Jan 18, 2024

fd0r Jan 18, 2024

fd0r left a comment

github-actions bot commented Jan 18, 2024

Known flaky tests that initially failed:

github-actions bot commented Jan 18, 2024

kcelia Jan 18, 2024

kcelia Jan 18, 2024

		The `SGDClassifier` does not currently support training models with other values for the bit-widths. Second, the time to train the model
		is proportional to the number of features and the number of training examples.


		## Training configuration

		The `max_iter` parameter controls the number of batches that are processed by the training algorithm.


		The `max_iter` parameter controls the number of batches that are processed by the training algorithm.

		The `parameters_range` parameter determines the initialization of the coefficients and the bias of the logistic regression. It is recommended to give values that are close to the min/max of the training data. It is also possible to normalize the training data so that it lies in the range $$[-1, 1]$$.

docs: add explanation of encrypted training and federated learning #437

docs: add explanation of encrypted training and federated learning #437

Conversation

andrei-stoian-zama commented Jan 10, 2024 • edited Loading

RomanBredehoft left a comment

Choose a reason for hiding this comment

jfrery left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bcm-at-zama left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RomanBredehoft Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RomanBredehoft left a comment

Choose a reason for hiding this comment

RomanBredehoft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fd0r left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 18, 2024

⚠️ Known flaky tests have been re-run ⚠️

Known flaky tests that initially failed:

github-actions bot commented Jan 18, 2024

Coverage failed ❌

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrei-stoian-zama commented Jan 10, 2024 •

edited

Loading

RomanBredehoft Jan 17, 2024 •

edited

Loading