-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[differential_privacy] Learning rates used for Adaptive Clipping experiments #59
Comments
Using a server learning rate = 1 should be just as good as anything else. I believe that's what we observed. |
Thanks! Should I use that value for all clipping quantiles and noise multipliers then? |
My impression is that yes, this should work well for all noise multipliers and clipping quantiles. I will check with the paper authors and get back to you on this one, but that's what I have observed in my own experiments. |
Thanks, that would be great! :) |
Hello. Peter is correct that a server learning rate of 1 is generally fine and you shouldn't expect significant gains from optimizing it. However in the paper we did experiment with different learning rates to account for the impact of clipping. I can provide the optimal values we used here. For each task there is the optimal server learning rate (SLR) with fixed clipping and with adaptive clipping to the median. The values below are the log base 10 of the SLR chosen from the development set. Note that for fixed I am giving you the optimal SLR with the best fixed clip C* as shown in Figure 7, while for adaptive I am giving you the optimal SLR with clipping to the median. (Different fixed clips would have different optimal SLRs.) CIFAR-100 fixed: -0.25 Hope that helps. |
Thank you so much @galenmandrew, that's very helpful. I'm not sure I understand how a learning rate of 0.0 for EMNIST-CR adaptive would work. Surely the model won't get updated? |
Hi,
I am trying to reproduce the experiments in "Differentially Private Learning with Adaptive Clipping" (2021), the source code for which is provided under
federated/differential_privacy
. The paper does not report the final server learning rates used for DP-FedAvgM with clipping enabled. It simply states the following in Section 3.1 -Therefore, for all approaches with clipping—fixed or adaptive—we search over a small grid of five server learning rates, scaling the
values in Table 1 by {1, 10^1/4, 10^1/2, 10^3/4, 10}. For all configurations, we report the best performing model whose server learning rate was chosen from this small grid on the validation set.
It is not computationally feasible for me to search for the optimal server lr in every possible configuration so I was hoping you could specify the learning rates that were used for training the best performing models. Thank you.
The text was updated successfully, but these errors were encountered: