We propose COPA, the first unified framework for certifying robust policies for general offline RL against poisoning attacks, based on certification criteria including per-state action stability and the lower bound of cumulative reward. Specifically, we propose new partition and aggregation protocols (PARL, TPARL, DPARL) to obtain robust policies and provide certification methods for them. More details can be found in our paper:
Fan Wu*, Linyi Li*, Chejian Xu, Huan Zhang, Bhavya Kailkhura, Krishnaram Kenthapadi, Ding Zhao, and Bo Li, "COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks", ICLR 2022 (*Equal contribution)
All experimental results are available at the website https://copa-leaderboard.github.io/.
In our paper, we conduct experiments on Atari games Freeway and Breakout, as well as an autonomous driving environment Highway. For each RL environment, we evaluate three RL algorithms (DQN, QR-DQN, and C51), three aggregation protocols and certification methods (PARL, TPARL, and DPARL), up to three partition numbers, and multiple horizon lengths.
Reference implementation for experiments on Atari games can be found at https://github.com/AI-secure/COPA_Atari.
Reference implementation for experiments on Highway can be found at https://github.com/AI-secure/COPA_Highway.
@inproceedings{wu2022copa,
title={COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks},
author={Wu, Fan and Li, Linyi and Xu, Chejian and Zhang, Huan and Kailkhura, Bhavya and Kenthapadi, Krishnaram and Zhao, Ding and Li, Bo},
booktitle={International Conference on Learning Representations},
year={2022}
}