Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add FHE training deployment #665

Merged
merged 7 commits into from
Jun 3, 2024
Merged

feat: add FHE training deployment #665

merged 7 commits into from
Jun 3, 2024

Conversation

jfrery
Copy link
Collaborator

@jfrery jfrery commented May 1, 2024

closes https://github.com/zama-ai/concrete-ml-internal/issues/4373
closes https://github.com/zama-ai/concrete-ml-internal/issues/4454

Proposition:

from concrete.ml.deployment import FHEModelDev

# Save the training FHE circuit for production 
fhe_dev = FHEModelDev("fhe_training_sgd", sgd_clf_binary_fhe)
fhe_dev.save(mode="training|inference")

# No change on the client and server side

Check the notebook to see how to use it. Basically, it's the same as before for the user. We only add a single parameter in the fhe_dev.save

@cla-bot cla-bot bot added the cla-signed label May 1, 2024
@jfrery jfrery force-pushed the feat/training_deployment branch 3 times, most recently from dd6fc4c to 3fa5d75 Compare May 23, 2024 14:12
@jfrery jfrery marked this pull request as ready for review May 23, 2024 14:12
@jfrery jfrery requested a review from a team as a code owner May 23, 2024 14:12
@andrei-stoian-zama andrei-stoian-zama self-requested a review May 23, 2024 14:29
Copy link
Collaborator

@andrei-stoian-zama andrei-stoian-zama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notebook seems a bit loaded..

I think we could:

  • move the plot_decision boundary functions to some file in utils.
  • why do we need two plot_decision_boundary functions?
  • only show the decision boundary graph at iteration 0 and the last iteration

Next, we should showcase Deployment first:

  1. "Training on encrypted data in production": export the trainer with the FheModelDev/etc, load client/server, etc..
  2. "Develop a Logistic regression trainer for encrypted data" -> here we show the simulation part

Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks ! I have several comments and observations

@jfrery
Copy link
Collaborator Author

jfrery commented May 23, 2024

The notebook seems a bit loaded..

I agree.

I think we could:

* move the plot_decision boundary functions to some file in `utils`.

* why do we need two `plot_decision_boundary` functions?

One uses the weigths/bias and the other uses the concrete-ml .predict(). I will make a single one.

* only show the decision boundary graph at iteration 0 and the last iteration

I will do this if I can't find anything better. I agree 10 plots is too much. But I want to show that the model is learning.

Next, we should showcase Deployment first:

1. "Training on encrypted data in production": export the trainer with the FheModelDev/etc, load client/server, etc..

2. "Develop a Logistic regression trainer for encrypted data" -> here we show the simulation part

Feels weird to me to show first the production and then how you develop the model which will eventually go to production.

I agree with most of your remark. Maybe we should create a new notebook for production only and refer to the development one.

@jfrery jfrery force-pushed the feat/training_deployment branch from 3fa7006 to 9bd428e Compare May 24, 2024 07:28
@jfrery jfrery requested a review from RomanBredehoft May 24, 2024 07:33
@jfrery jfrery force-pushed the feat/training_deployment branch 3 times, most recently from 8c290c2 to c6d2d5c Compare May 24, 2024 08:47
RomanBredehoft
RomanBredehoft previously approved these changes May 24, 2024
Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for this !

RomanBredehoft
RomanBredehoft previously approved these changes May 24, 2024
@@ -687,13 +680,7 @@ def fit( # type: ignore[override]

# If the model should be trained using FHE training
if self.fit_encrypted:
if fhe is None:
fhe = "disable"
warnings.warn(
Copy link
Collaborator

@RomanBredehoft RomanBredehoft May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this warning was made on purpose, I think we should keep it

the idea if that users should set fhe to something if they activate fit_encrypted to be sure they know what they are doing

the reason why fhe is not defaulted to disable is to avoid the ambiguous situation of having fit_encrypted to false and fhe to disable, which wouldn't make much sense since training would be done in floating points with sklearn

cc @fd0r

Copy link
Collaborator Author

@jfrery jfrery May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I miss something but this didn't really make sense to me. If we have a warning here then why not for all our models in the predict?

avoid the ambiguous situation of having fit_encrypted to false and fhe to disable, which wouldn't make much sense since training would be done in floating points with sklearn

If fit_encrypted = False then we have this:

if fhe is not None:
raise ValueError(
"Parameter 'fhe' should not be set when FHE training is disabled. Either set it to "
"None for floating point training in the clear or set 'fit_encrypted' to True when "
f"initializing the model. Got {fhe}."
)

Or what do I miss?

Copy link
Collaborator

@RomanBredehoft RomanBredehoft May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes you right about the second case, good thing then !

but the reason we don't have this in the predict methods is simply because there is not ambiguity there, we only have one parameter related to fhe execution : fhe (and it can't be None). Whereas with encrypted training, we have this additional fit_encrypted in the init.

so more or less I believe the idea here was to better determine the fhe parameter's role, by having something like :

  • None : float clear
  • disable | simulate | fhe : the usual

so when a user sets fit_encrypted, it's better to make sure he's aware of what mode he's using

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no notion of fhe= if it's a training in the clear and we have a assert on this. To me, fhe="disable" should be the default if users use fit_encrypted = True.

I don't think that the warning "Parameter 'fhe' isn't set while FHE training is enabled.\n Defaulting to '{fhe=}'" is of any help to the user.

If the API is really confusing, having warning to try to explain isn't the right direction. Instead we should rethink the API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well iirc there was some discussion on this in particular when the API was made and this is how we decided to go for, maybe @fd0r can tell us more about it !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @RomanBredehoft got it right.

We don't want disable by default since it wouldn't make sense if fit_encrypted=False.
But not providing a value for fhe if fit_encrypted=True is also "dangerous". The idea here was to make sure that the user has to provide some parameter for fhe in this case.

Now that I'm re-reading that I would have put an Exception instead of a warning here.

I don't think we should remove the warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want disable by default since it wouldn't make sense if fit_encrypted=False.

But if fit_encrypted=False. and fhe != None, we raise an error so this is not possible.

But not providing a value for fhe if fit_encrypted=True is also "dangerous"

I really don't understand why this is "dangerous". For the inference, disable is the default. Why would it be dangerous to do it for training?

Also, what should the user learn from "Parameter 'fhe' isn't set while FHE training is enabled.\n Defaulting to '{fhe=}'"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand why this is "dangerous". For the inference, disable is the default. Why would it be dangerous to do it for training?

Because we would be changing the default from the signature mostly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what should the user learn from "Parameter 'fhe' isn't set while FHE training is enabled.\n Defaulting to '{fhe=}'"?

Mainly that the default value is changing is shouldn't really happen. And that the prefered way is to specify this argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright 🏳️ well I will add back the warnings and tests for it.

@@ -763,13 +750,7 @@ def partial_fit(
# A partial fit is similar to a fit with a single iteration. The slight differences between
# both are handled in the encrypted method when setting `is_partial_fit` to True.
if self.fit_encrypted:
if fhe is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feature !! lots of cleaning, which is great. I have several questions and observations

@jfrery jfrery requested a review from RomanBredehoft May 29, 2024 13:59
@jfrery jfrery force-pushed the feat/training_deployment branch from a49f686 to d39c6aa Compare May 29, 2024 14:10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the deployment be done without any call to the fit method?

We should be able to deploy with only the batch size, the number of features and the batch size as inputs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we should probably take composition into account such that n-iter is included in the query.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the deployment be done without any call to the fit method?

We should be able to deploy with only the batch size, the number of features and the batch size as inputs.

I just use the existing way of creating a training FHE circuit. Do we have something else than the _fit_encrypted method to create such circuit? Or are you proposing to wrap this into another method within the FHEModel API?

That being said, I definitely agree that having to instantiate fhe circuit for deployment with a call to fit is super weird...

Also we should probably take composition into account such that n-iter is included in the query.

I am not sure where the n-iter would belong but I believe it's a parameter that needs to be sent to the server so I would say it would need to be sent at the same time as the serialized encrypted values. I don't think it has anything to do with the FHEModel API unless we start implementing the communication protocol.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for these changes, let's address the rest in a follow-up PR.

Copy link
Collaborator

@fd0r fd0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for reviewing this so late.

I have a few comments on the API design.

IMHO we should consider composition for this, or merge as is and improve upon that, but it would be easier if it was already taken into account.

Also we shouldn't have to call the fit function before deploying imo. It's true that we would have to take the inputs ranges into account, the batch-size and number of features and targets.

@jfrery jfrery requested a review from fd0r May 31, 2024 11:57
Copy link
Collaborator

@fd0r fd0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's iterate on that! Thanks a lot for taking care of this!

f"Defaulting to '{fhe=}'",
stacklevel=2,
)
fhe = "disable" if fhe is None else fhe
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't changed back

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good

Copy link
Collaborator

@fd0r fd0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning is still removed.

Copy link

github-actions bot commented Jun 3, 2024

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    7622      0   100%

59 files skipped due to complete coverage.

Copy link
Collaborator

@RomanBredehoft RomanBredehoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, thanks a lot ! one quick question: how is it going to work once we integrate composition with training (#660) ?

@jfrery jfrery merged commit b718629 into main Jun 3, 2024
12 checks passed
@jfrery jfrery deleted the feat/training_deployment branch June 3, 2024 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants