-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knn classifier in cml #217
Conversation
8045f47
to
a50ad1d
Compare
6afa513
to
2f087c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great first draft well done! Could you explain the choices about what is done in FHE and what is done in clear on the client side?
It looks like we only partly compute the distances in FHE. Why isn't the sqrt done in FHE? Is it just prohibitively expensive to compute the topk and majority vote in FHE? Currently it seems that we return all distance to the client which is probably going to leak quite some information about the training.
src/concrete/ml/sklearn/base.py
Outdated
distance_matrix = ( | ||
numpy.sum(q_X**2, axis=1, keepdims=True) | ||
- 2 * q_X @ self._q_X_fit.T | ||
+ numpy.expand_dims(numpy.sum(self._q_X_fit**2, axis=1), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy.expand_dims(numpy.sum(self._q_X_fit**2, axis=1), 0)
can be done at training time I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a constant no? it will be precomputed by CP
Seems like this PR adds a lot of time to the CI. |
45bc34f
to
0dc23cf
Compare
quantization not working properly add similarity point encrypted argsort and topk in clear
only pairwise.euclidean_distances is encrypted topk and majority vote are done on the client side
7178816
to
9d0a4dd
Compare
69f11f8
to
dac4e8c
Compare
# a training phase | ||
self._q_fit_X: numpy.ndarray | ||
# _y: Labels of `_q_fit_X` | ||
self._y: numpy.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't followed everything, so we keep this _y
attribute then ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can keep it but we need to make sure it doesn't exist in the model exported in the client (in client / server)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes ok so basically it's for the predict
but not for the post_processing right got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good just a few comments.
src/concrete/ml/sklearn/base.py
Outdated
x = scatter1d(x, max_x, range_i + d) | ||
|
||
# Max index selection | ||
sign = diff <= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the sign but a boolean I guess. Why isn't this done right after computing diff?
Have you tried the CP comparison optimization by replacing
diff = a - b
sign = diff <= 0
to
sign = a <= b
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning of this conversation, I thought we were skeptical about letting the compiler choose the best strategy.
But I missed Ruby's last messages, which are:
yes, it make the bitwidth compatible with the stategies you asked, and once bitwidth inference is done, it picks the best strategy based on an heuristic (I think it tries to minimize the number of TLU without increasing the maximum precision). So if a 8bit TLU already exists in the circuit it accepts to use that precision, otherwise it will try to stick to lower precisions.
It's not optimal in the sense of the cost model as it would requires solving the crypto-parameters
So, I think CP's comparison is worth using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed a boolean that tells us if a is greater than b.
I'll change the naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can just check on a simple example if you have any time improvement. Otherwise you can leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't have the time right now worth an issue then I think because this might be a good thing to try anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do it in a separate PR.
Some testings are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great can you create an issue for this?
4b03fb9
to
3cf5e7d
Compare
3cf5e7d
to
fd2c1c7
Compare
Coverage passed ✅Coverage details
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks. Please just create an issue to check the CP comparison (see comment)
As of now:
closes #3818