Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[differential_privacy] Learning rates used for Adaptive Clipping experiments #59

Open
VasundharaAgarwal opened this issue Mar 31, 2022 · 6 comments

Comments

@VasundharaAgarwal
Copy link

VasundharaAgarwal commented Mar 31, 2022

Hi,

I am trying to reproduce the experiments in "Differentially Private Learning with Adaptive Clipping" (2021), the source code for which is provided under federated/differential_privacy. The paper does not report the final server learning rates used for DP-FedAvgM with clipping enabled. It simply states the following in Section 3.1 -

Therefore, for all approaches with clipping—fixed or adaptive—we search over a small grid of five server learning rates, scaling the
values in Table 1 by {1, 10^1/4, 10^1/2, 10^3/4, 10}. For all configurations, we report the best performing model whose server learning rate was chosen from this small grid on the validation set.

It is not computationally feasible for me to search for the optimal server lr in every possible configuration so I was hoping you could specify the learning rates that were used for training the best performing models. Thank you.

@kairouzp
Copy link
Contributor

Using a server learning rate = 1 should be just as good as anything else. I believe that's what we observed.

@VasundharaAgarwal
Copy link
Author

Using a server learning rate = 1 should be just as good as anything else. I believe that's what we observed.

Thanks! Should I use that value for all clipping quantiles and noise multipliers then?

@kairouzp
Copy link
Contributor

My impression is that yes, this should work well for all noise multipliers and clipping quantiles. I will check with the paper authors and get back to you on this one, but that's what I have observed in my own experiments.

@VasundharaAgarwal
Copy link
Author

My impression is that yes, this should work well for all noise multipliers and clipping quantiles. I will check with the paper authors and get back to you on this one, but that's what I have observed in my own experiments.

Thanks, that would be great! :)

@galenmandrew
Copy link
Contributor

Hello. Peter is correct that a server learning rate of 1 is generally fine and you shouldn't expect significant gains from optimizing it. However in the paper we did experiment with different learning rates to account for the impact of clipping. I can provide the optimal values we used here.

For each task there is the optimal server learning rate (SLR) with fixed clipping and with adaptive clipping to the median. The values below are the log base 10 of the SLR chosen from the development set. Note that for fixed I am giving you the optimal SLR with the best fixed clip C* as shown in Figure 7, while for adaptive I am giving you the optimal SLR with clipping to the median. (Different fixed clips would have different optimal SLRs.)

CIFAR-100 fixed: -0.25
CIFAR-100 adaptive: -0.5
EMNIST-CR fixed: 0.25
EMNIST-CR adaptive: 0.0
EMNIST-AE fixed: 0.5
EMNIST-AE adaptive: 0.5
SHAKESPEARE fixed: -0.25
SHAKESPEARE adaptive: -0.5
SO-NWP fixed: 1.0
SO-NWP adaptive: 0.5
SO-LR fixed: 0.25
SO-LR adaptive: 0.25

Hope that helps.

@VasundharaAgarwal
Copy link
Author

VasundharaAgarwal commented Apr 13, 2022

Thank you so much @galenmandrew, that's very helpful.

I'm not sure I understand how a learning rate of 0.0 for EMNIST-CR adaptive would work. Surely the model won't get updated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants