Skip to content

Latest commit

 

History

History
54 lines (54 loc) · 2.22 KB

2024-04-18-cundy24a.md

File metadata and controls

54 lines (54 loc) · 2.22 KB
title software abstract layout series publisher issn id month tex_title firstpage lastpage page order cycles bibtex_author author date address container-title volume genre issued pdf extras
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.
inproceedings
Proceedings of Machine Learning Research
PMLR
2640-3498
cundy24a
0
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients
2809
2817
2809-2817
2809
false
Cundy, Chris J. and Desai, Rishi and Ermon, Stefano
given family
Chris J.
Cundy
given family
Rishi
Desai
given family
Stefano
Ermon
2024-04-18
Proceedings of The 27th International Conference on Artificial Intelligence and Statistics
238
inproceedings
date-parts
2024
4
18