A list of selected paper and possible corresponding codes in our review paper A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges.
If you find there is a missed paper or a possible mistake in our survey, please feel free to email me ([email protected]) or pull a request here. I am more than glad to receive your advice. Thanks!
If you find this survey useful for your research, please consider citing
title={A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges},
author={Qing, Yunpeng and Liu, Shunyu and Song, Jie and Wang, Huiqiong and Song, Mingli},
journal={arXiv preprint arXiv:2211.06665},
- 2025.2.10: We have updated our review paper with the latest revisions and incorporated newly published research from 2023 to 2024.
- 2023.11.01: We have updated our review paper with the latest revisions and incorporated newly published research from 2022 to 2023.
- 📖 RL paradigm-based Explainable RL Taxonomy
- 👓 Review of human knowledge-based RL explainability
- 🚀 List of Current XRL research literatures and codes links
In this survey, we provide a comprehensive review of existing works on eXplainable Reinforcement Learning (XRL) and introduce a new taxonomy where prior works are clearly categorized into agent model-explaining, reward-explaining, state-explaining, and task-explaining methods. We also review and highlight RL methods that conversely leverage human knowledge to promote learning efficiency and performance of agents while this kind of method is often ignored in XRL field.
To know more about existing XRL framework and our taxonomy, the existing XRL papers within different typs are listed below and summerized in the next Figure. These literatures are categorize into our taxonomy. For each paper, we also include a link to its open-source code if available.
- Explainable Reinforcement Learning: A Survey
- E. Puiutta and E. Veith. CD-MAKE 2020. [paper]
- A Survey on Interpretable Reinforcement Learning
- C. Glanois, P. Weng, M. Zimmer, D. Li, T. Yang, J. Hao and W. Liu. arXiv 2021. [paper]
- Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey
- R. Dazeley, P. Vamplew and F.Cruz. arXiv 2021. [paper]
- Explainable AI and Reinforcement Learning—A Systematic Review of Current Approaches and Trends
- Lindsay Wells and Tomasz Bednarz. FRAI 2021. [paper]
- Explainability in deep reinforcement learning
- A. Heuillet, F. Couthouis and N. Díaz-Rodríguez. KBS 2021. [paper]
- Explainable Deep Reinforcement Learning: State of the Art and Challenges
- G. Vouros. CSUR 2022. [paper]
- Explainable Reinforcement Learning: A Survey and Comparative Review
- S. Milani, N. Topin, M .Veloso and F. Fang. CSUR 2023. [paper]
- Interpretable and Explainable Logical Policies via Neurally Guided Symbolic Abstraction
- Fuzzy centered explainable network for reinforcement learning
- L Ou, YC Chang, YK Wang and CT Lin. TFS 2023. [paper]
- MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning
- Learning to synthesize programs as interpretable and generalizable policies
- Symbolic Regression via Deep Reinforcement Learning Enhanced Genetic Programming Seeding
- Discovering symbolic policies with deep reinforcement learning
- Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods
- N. Topin, S. Milani, F. Fang and M Veloso. AAAI 2021. [paper]
- Incorporating relational background knowledge into reinforcement learning via differentiable inductive logic programming
- Evolutionary learning of interpretable decision trees
- Optimization methods for interpretable differentiable decision trees applied to reinforcement learning
- A. Silva, M. Gombolay, T. Killian, I. Jimenez and S. Son. AISTATS 2020. [paper]
- Neurosymbolic transformers for multi-agent communication
- Generating interpretable reinforcement learning policies using genetic programming
- D. Hein, S. Udluft and T. Runkler. GECCO 2019. [paper]
- Imitation-projected programmatic reinforcement learning
- Towards Reinforcement Learning of Human Readable Policies
- R. Akrour, D. Tateo and J. Peters. ECML-PKDD workshop 2019. [paper]
- Neural Logic Reinforcement Learning
- Inductive logic programming via differentiable deep neural logic networks
- Conservative q-improvement: Reinforcement learning for an interpretable decision-tree policy
- Generation of policy-level explanations for reinforcement learning
- N. Topin and M. Veloso. AAAI 2019. [paper]
- Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
- Programmatically Interpretable Reinforcement Learning
- Interpretable policies for reinforcement learning by genetic programming
- D. Hein, S. Udluft and T. Runkler. EAAI 2018. [paper]
- Verifiable Reinforcement Learning via Policy Extraction
- Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies
- D. Hein, A. Hentschel,T Runkler and S Udluft. EAAI 2017. [paper]
- Policy Search in a Space of Simple Closed-form Formulas: Towards Interpretability of Reinforcement Learning
- F. Maes, R. Fonteneau, L. Wehenkel and D. Ernst. DS 2012. [paper]
- A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- Explaining reinforcement learning agents through counterfactual action outcomes
- Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis
- Explainable Multi-Agent Reinforcement Learning for Temporal Queries
- Explainable Reinforcement Learning via a Causal World Model
- “I Don’t Think So”: Summarizing Policy Disagreements for Agent Comparison
- Y Amita and O Amir. AAAI 2022. [paper]
- A CEGAR-Driven Training and Verification Framework for Safe Deep Reinforcement Learning
- Toward Policy Explanations for Multi-Agent Reinforcement Learning
- Counterfactual state explanations for reinforcement learning agents via generative deep learning
- Generating high-quality explanations for navigation in partially-revealed environments
- Explainable Reinforcement Learning through a Causal Lens
- Neurosymbolic reinforcement learning with formally verified exploration
- An inductive synthesis framework for verifiable reinforcement learning
- H. Zhu, Z. Xiong, S. Magill and S. Jagannathan. PLDI 2019. [paper]
- Verifying Deep-RL-Driven Systems
- Y. Kazak, C. Barrett, G. Katz and M. Schapira SIGCOMM 2019 workshop. [paper]
- Autonomous self-explanation of behavior for interactive reinforcement learning agents
- Y. Fukuchi, M. Osawa, H. Yamakawa and M. Imai. HAI 2017. [paper]
- Application of Instruction-Based Behavior Explanation to a Reinforcement Learning Agent with Changing Policy
- Y. Fukuchi, M. Osawa, H. Yamakawa and M. Imai. ICONIP 2017. [paper]
- Improving Robot Controller Transparency Through Autonomous Policy Explanation
- B. Hayes and J. Shah. HRI 2017. [paper]
- Visual imitation learning with patch rewards
- Shapley counterfactual credits for multi-agent reinforcement learning
- J. Li, K. Kuang, B. Wang, F. Liu, L. Chen, F. Wu and J. Xiao. SIGKDD 2021. [paper]
- Shapley Q-value: A local reward approach to solve global reward games
- Explainable reinforcement learning via reward decomposition
- Z. Juozapaitis, A. Koul. A. Fern, M. Erwig and F. Doshi-Velez. IJCAI/ECAI workshop 2019. [paper]
- Counterfactual multi-agent policy gradients
- Creativity of AI: Automatic Symbolic Option Discovery for Facilitating Deep Reinforcement Learning
- M. Jin, Z. Ma, K. Jin, H. Zhuo, C. Chen and C. Yu. AAAI 2022. [paper]
- Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior
- Self-supervised attention-aware reinforcement learning
- H. Wu, K. Khetarpal and D. Precup. AAAI 2021. [paper]
- Ella: Exploration through learned language abstraction
- Tree-structured policy based progressive reinforcement learning for temporally language grounding in video
- Improving Human-Robot Interaction Through Explainable Reinforcement Learning
- A. Tabrez and B. Hayes. HRI 2019. [paper]
- SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
- D. Lyu, F. Yang, B. Liu and S. Gustafson. AAAI 2019. [paper]
- Accountability in offline reinforcement learning: Explaining decisions with a corpus of examples
- Explaining RL decisions with trajectories
- Towards interpretable deep reinforcement learning with human-friendly prototypes
- Collective explainable AI: Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values
- ProtoX: Explaining a Reinforcement Learning Agent via Prototyping
- Explainable ai in deep reinforcement learning models for power system emergency control
- K. Zhang, J. Zhang, P. Xu, T. Gao and D. Gao. TCSS 2021. [paper]
- Edge: Explaining deep reinforcement learning policies
- Interestingness elements for explainable reinforcement learning: Understanding agents' capabilities and limitations
- Visual sparse Bayesian reinforcement learning: a framework for interpreting what an agent has learned
- I. Mishra, G. Dao and M. Lee. SSCI 2018. [paper]
- Robust bayesian inverse reinforcement learning with sparse behavior noise
- J. Zheng, S. Liu and L. Ni. AAAI 2014. [paper]
- Explainable Deep Adversarial Reinforcement Learning Approach for Robust Autonomous Driving
- C Wang and N Aouf. TIV 2024. [paper]
- Explaining reinforcement learning with shapley values
- Training characteristic functions with reinforcement learning: Xai-methods play connect four
- S. Waldchen, F. Huber and S. Pokutta. ICML 2022. [paper]
- Look where you look! Saliency-guided Q-networks for generalization in visual Reinforcement Learning
- D. Bertoin, A. Zouitine, M. Zouitine and E. Rachelson. NeurIPS 2022. [paper]
- Inherently explainable reinforcement learning in natural language
- Machine versus human attention in deep reinforcement learning tasks
- S. Guo, R. Zhang, B. Liu, Y. Zhu, D. Ballard, M. Hayhoe and P. Stone. NeurIPS 2021. [paper]
- The sensory neuron as a transformer: Permutation-invariant neural networks for reinforcement learning
- Neuroevolution of self-interpretable agents
- Deep reinforcement learning with stacked hierarchical attention for text-based games
- Xgail: Explainable generative adversarial imitation learning for explainable human decision analysis
- Towards better interpretability in deep q-networks
- Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks
- J. Wang, Y. Zhang, K. Tang, J. Wu and Z. Xiong. SIGKDD 2019. [paper]
- Social attention for autonomous decision-making in dense traffic
- DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks
- J. Wang, L. Gou, H. Shen and H. Yang. TVCG 2018. [paper]
- Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees
- Learn to interpret atari agents
- Unsupervised video object segmentation for deep reinforcement learning
- Visualizing and Understanding Atari Agents
- Rise: Randomized input sampling for explanation of black-box models
- Transparency and Explanation in Deep Reinforcement Learning Neural Networks
- Refining diffusion planner for reliable behavior synthesis by automatic detection of infeasible plans
- An explainable and robust motion planning and control approach for autonomous vehicle on-ramping mergingtask using deep reinforcement learning
- B Hu, L Jiang, S Zhang and Q Wang. TTE 2023 [paper]
- What did you think would happen? explaining agent behaviour through intended outcomes
- Weakly-supervised reinforcement learning for controllable behavior
- L. Lee, B. Eysenbach, R. Salakhutdinov, S. Gu and C. Finn. NeurIPS 2020. [paper]
- Semantic Predictive Control for Explainable and Efficient Policy Learning
- X. Pan; X. Chen; Q. Cai; J. Canny and F. Yu. ICRA 2019. [paper]
- Safe Reinforcement Learning With Model Uncertainty Estimates
- B. Lütjens, M. Everett and J. How. ICRA 2019. [paper]
- Contrastive explanations for reinforcement learning in terms of expected consequences
- J. Waa, J. Diggelen, K. Bosch and M. Neerincx. arXiv 2018. [paper]
- A Boolean task algebra for reinforcement learning
- Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning
- T. Shu, C. Xiong and R. Socher. ICLR 2018. [paper]
- Multi-task reinforcement learning with context-based representations
- Model primitives for hierarchical lifelong reinforcement learning
- Language as an abstraction for hierarchical deep reinforcement learning
- SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning
- D. Lyu, F. Yang, B. Liu and S. Gustafson. AAAI 2019. [paper]
- Dot-to-dot: Explainable hierarchical reinforcement learning for robotic manipulation
- B. Beyret, A. Shafti, A. Faisal. IROS 2019. [paper]
- Fuzzy Action-Masked Reinforcement Learning Behavior Planning for Highly Automated Drivin
- T. Rudolf, M. Gao, T. Schürmann, S. Schwab and S. Hohmann. ICCAR 2022. [paper]
- Efficient hierarchical policy network with fuzzy rules
- W. Shi, Y. Feng, H. Huang, Z. Liu, J. Huang and G. Cheng. IJMLC 2022. [paper]
- KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge
- P. Zhang, J. Hao, W. Wang, H. Tang, Y. Ma, Y. Duan and Y. Zheng. arXiv 2020. [paper]
- Using Natural Language for Reward Shaping in Reinforcement Learning
- Curricular Subgoals for Inverse Reinforcement Learning
- Local explanations for reinforcement learning
- R. Luss, A. Dhurandhar, M. Liu. AAAI 2023. [paper]
- Textual Explanations for Self-Driving Vehicles
- Towards Effective and Interpretable Human-Agent Collaboration in MOBA Games: A Communication Perspective
- LISA: Learning interpretable skill abstractions from language
- D. Garg, S. Vaidyanath, K. Kim, J. Song and S. Ermon. NeurIPS 2022. [paper]
- Perceiving the world: Question-guided reinforcement learning for text-based games
- Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning
As for completeness, we also list the library of explainable AI methods to tackle the balck box problem of AI methods. They can emhance the AI model with transparency and explainability.
Explainable AI library | GitHub Stars |
Aequitas | |
Alibi Explain | |
Captum | |
DeepVis Toolbox | |
ELI5 | |
InterpretML | |
IBM AI Explainability 360 | |
iModels | |
LIME | |
OmniXAI | |
Please ⭐️ this repository if this project helped you!