Skip to content

Latest commit

 

History

History
136 lines (83 loc) · 10.5 KB

README.md

File metadata and controls

136 lines (83 loc) · 10.5 KB

Formal Robust Explanations for Deep Reinforcement Learning Models

This repository contains an implementation of the "deletion" algorithm [1] designed to compute abductive explanations for deep reinforcement learning models, and some experiments to evaluate its feasability and validity. Intuitively, abductive reasoning is a form of deduction in which explanations for an observation are given while taking into account a theory that formalizes a field of knowledge [2]. A real example of something that can be formalized using abductive reasoning is medical diagnosis. When a doctor diagnoses a patient, he/she is giving an (abductive) explanation for a set of signs and symptoms (observations). To do that, the doctor uses knowledge and expertice from the field of medicine (theory).

There is a mathematical formalization of this type of reasoning known as logic-based abduction [2]. Abductive explanations can also be computed for neural network's predictions, provided you can find a suitable representation in a formal logic language (as a MILP or an SMT, for instance). In this context, abductive explanations are subset of input features that are sufficient to guaratee a model's prediction [1]. Shorter or minimal explanations are prefered because they are easier to understand by humans.

There are two main categories of abductive explanations: cost-minimal or subset-minimal. Subset-minimal explanations are explanations that do not contain as a proper subset other explanations. To our knowledge, so far neither subset-minimal nor cost-minimal explanations have been computed for deep reinforcement learning systems. In the present work, we implemented a simple "deletion" algorithm that had been previously applyed to compute abductive explanations for deep neural networks in classification tasks [1].

We tested our implementations on the reinforcement learning benchmark of the VNN-COMP of 2022 [3], which is publicly available at https://github.com/ChristopherBrix/vnncomp2022_benchmarks. This benchmark contains 3 deep neural network models, each trained on one of the following tasks: CartPole-v1 [4], Lunar Lander-v2 [4] and Dubins Rejoin [5].

With our experiments, we showed evidence that the explanations generated by the deletion algorithm may be biased. The algorithm favors the inclusion, to the explanations, of features deleted later on in the execution, a so called “order effect”. Observe, in the following figures, the left to right pattern in the frequency (of including a feature), as part of an explanation. Epsilon is a parameter that determines the size of the perturbation used around an input instance. When the perturbation gets too large, the explanation becomes trivial, that is, all the features explain the prediction. This was expected beforehand and we intended to choose the shortest explanations within this search space.

CartPole deletion Lunar Lander deletion Dubins Rejoin deletion

Our results suggest that the aforementioned bias may be present in other implementations of the deletion algorithm for machine learning tasks in general [1],[6]. It is likely that research groups are not aware of this, an so, these results are a relevant contribution. To test this hypothesis further, we proposed a solution on how to fix this problem and designed an elementary algorithm ("adversarial-robustness algorithm") to compute robust, formal and non-biased explanations to deep reinforcement learning model predictions.

This new algorithm does not show the same order effects as the deletion algorithm, adding weight to our interpretation of the deletion algorithm shortcommings. With this new procedure, the perturbation strengh (epsilon) can be used as a proxy for the robustness of explanations and, additionally, it aids in the search for the smallest explanations. In the future, we plan on assesing the validity of this new algorithm more systematically and we will to look for potential improvements.

Set Up

We ran our experiments on an AWS ml.t3.medium instance using python 3 and the bash shell languages. AWS instances support Linux-based operating systems. We recommend running our scripts in an AWS EC2 instance or in a machine with a similar linux version.

To verify the neural network properties, we used alpha-beta-CROWN [7]–[12], a fast and scalable set of tools and winner of the VNN-COMP in 2021 [3] and 2022 [13]. Before running our scripts, choose a working directory and clone the alpha-beta-CROWN there:

git clone https://github.com/Verified-Intelligence/alpha-beta-CROWN 

Change directory to the alpha-beta-CROWN verifier folder and install the tool as as a conda environment. The requirements are found in the environment.yml file.

cd ~/alpha-beta-CROWN/complete_verifier
conda env create -f environment.yml --name alpha-beta-crown

Instructions

The scripts uploaded have comments so that their operation can be understood by a user. To run the scripts, first you have to activate the alpha-beta-CROWN conda environment.

conda activate alpha-beta-crown

The VNN-COMP benchmark contains instances of safety properties in the VNN-LIB format (CartPole-v1: 100 instances; Lunar Lander-v2: 100 instances; Dubins Rejoin: 96 instances). Each property coded an instance observation of the environment, a model, its output, an 𝑙-1 type of perturbation on the input features and a set of safety constraints over the output variables. Clone the VNN-COMP 2022 benchmarks with the command:

git clone https://github.com/ChristopherBrix/vnncomp2022_benchmarks

These instances had to be first modified so they were suitable to apply the algorithms. This transformation basically entailed altering the magnitude of the perturbation to the inputs and changing the output constrains when needed. The original inputs, the models and the models predictions remained the same.

There are three .sh scripts to create the instances (create_benchmark_task.sh), one for each task. To run them, modify the files to include the appropriate working directory in your file system (under the root_directory field, which must contain all the other folders). Here is an example for CartPole:

#from the working directory, execute in your terminal. 

#give file permissions to the user
chmod u+x abduction_algorithm/create_instances_cartpole.sh

# create the instances 
./abduction_algorithm/create_instances_cartpole.sh

After you have created the instances, you can run the algorithms for batches of them. There are three .sh scripts to run the deletion algorithm (batch_task_deletion.sh), one for each task. In addition, there are three .sh scripts to run the "adversiarial-robustness" algorithm (batch_task_adversarial_robustness.sh), one for each task. Once aganin, modify the files to include your working directory (under the root_forder field). Here we show you two examples, one for Lunar Lander and the other for Dubins Rejoin:

#from the working directory, execute in your terminal. 

#give file permissions to the user
chmod u+x abduction_algorithm/batch_lunarlander_adversarial_robustness.sh

#Execute the adversarial-robustness algorithm
./abduction_algorithm/batch_lunarlander_adversarial_robustness.sh

#give file permissions to the user
chmod u+x abduction_algorithm/batch_dubinsrejoin_deletion_algorithm.sh

#Execute the deletion algorithm 
./abduction_algorithm/batch_dubinsrejoin_deletion_algorithm.sh

Results will be shown in the results folder.

Developers

Student: Michel Patiño-Sáenz

Advisor: Nicolás Cardozo

References

[1] A. Ignatiev, N. Narodytska, and J. Marques-Silva, “Abduction-Based Explanations for Machine Learning Models,” Nov. 2018, doi: 10.48550/arXiv.1811.10656.

[2] T. Eiter and G. Gottlob, “The complexity of logic-based abduction,” J. ACM, vol. 42, no. 1, pp. 3–42, Jan. 1995, doi: 10.1145/200836.200838.

[3] M. N. Müller, C. Brix, S. Bak, C. Liu, and T. T. Johnson, “The Third International Verification of Neural Networks Competition (VNN-COMP 2022): Summary and Results.” arXiv, Feb. 16, 2023. doi: 10.48550/arXiv.2212.10376.

[4] OpenAI, “Gym: A toolkit for developing and comparing reinforcement learning algorithms.” https://gym.openai.com (accessed Jul. 05, 2021).

[5] U. J. Ravaioli, J. Cunningham, J. McCarroll, V. Gangal, K. Dunlap, and K. L. Hobbs, “Safe Reinforcement Learning Benchmark Environments for Aerospace Control Systems,” in 2022 IEEE Aerospace Conference (AERO), Mar. 2022, pp. 1–20. doi: 10.1109/AERO53065.2022.9843750.

[6] J. Marques-Silva and A. Ignatiev, “Delivering Trustworthy AI through Formal XAI,” Proc. AAAI Conf. Artif. Intell., vol. 36, no. 11, Art. no. 11, Jun. 2022, doi: 10.1609/aaai.v36i11.21499.

[7] “α,β-CROWN (alpha-beta-CROWN): A Fast and Scalable Neural Network Verifier using the Bound Propagation Framework.” Verified Intelligence, Jun. 22, 2023. Accessed: Jun. 26, 2023. [Online]. Available: https://github.com/Verified-Intelligence/alpha-beta-CROWN

[8] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, “Efficient Neural Network Robustness Certification with General Activation Functions.” arXiv, Nov. 02, 2018. doi: 10.48550/arXiv.1811.00866.

[9] H. Salman, G. Yang, H. Zhang, C.-J. Hsieh, and P. Zhang, “A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks.” arXiv, Jan. 09, 2020. doi: 10.48550/arXiv.1902.08722.

[10] K. Xu et al., “Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond.” arXiv, Oct. 25, 2020. doi: 10.48550/arXiv.2002.12920.

[11] K. Xu et al., “Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers.” arXiv, Mar. 16, 2021. doi: 10.48550/arXiv.2011.13824.

[12] H. Zhang et al., “General Cutting Planes for Bound-Propagation-Based Neural Network Verification.” arXiv, Dec. 04, 2022. doi: 10.48550/arXiv.2208.05740.

[13] S. Bak, C. Liu, and T. Johnson, “The Second International Verification of Neural Networks Competition (VNN-COMP 2021): Summary and Results.” arXiv, Aug. 30, 2021. doi: 10.48550/arXiv.2109.00498.