*Resources used
A Deep Reinforcement Learning Chatbot
Deep Reinforcement Learning for recommender systems:
- https://arxiv.org/pdf/1801.00209.pdf
- https://arxiv.org/pdf/1810.12027.pdf (good literature review section)
- http://www.personal.psu.edu/~gjz5038/paper/www2018_reinforceRec/www2018_reinforceRec.pdf
- http://rail.eecs.berkeley.edu/deeprlcourse/
- Associated github for the HW assignments: https://github.com/berkeleydeeprlcourse/homework_fall2020
Chapter 13 is on Policy Gradient Methods
-
Algorithms of Reinforcement Learning https://sites.ualberta.ca/~szepesva/rlbook.html
-
Sutton and Barton's "Reinforcement Learning: An Introduction". Make sure you get the second edition (as of 2020). There are many pdfs online such as this one: https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
This entire site is worth reading
Autonomous reinforcement learning from raw visual data, Lange & Riedmiller (2010) Q learning on top of latent space leared with autoencoder, uses fitted Q-iteration
"Human level control through deep reinforcement learning", Mnih et al (2013)
"Continous control with Deep Reinforcement Learning", Lillicrap et. al. (2015)
- Watkins. (1989). Learning from delayed rewards: introduces Q-learning
- Riedmiller. (2005). Neural fitted Q-iteration: batch-mode Q-learning with neural networks
-
Lange, Riedmiller. (2010). Deep auto-encoder neural networks in reinforcement learning: early image-based Q-learning method using autoencoders to construct embeddings
-
Mnih et al. (2013). Human-level control through deep reinforcement learning: Qlearning with convolutional networks for playing Atari.
-
Van Hasselt, Guez, Silver. (2015). Deep reinforcement learning with double Q-learning: a very effective trick to improve performance of deep Q-learning.
-
Lillicrap et al. (2016). Continuous control with deep reinforcement learning: continuous Q-learning with actor network for approximate maximization.
-
Gu, Lillicrap, Stuskever, L. (2016). Continuous deep Q-learning with model-based acceleration: continuous Q-learning with action-quadratic value functions.
-
Wang, Schaul, Hessel, van Hasselt, Lanctot, de Freitas (2016). Dueling network architectures for deep reinforcement learning: separates value and advantage estimation in Q-function.
Robots!
-
"Robotic manipulation with Deep Reinforcement Learning ant...", Gu, Holly, et. al. (2017)
-
"QT Opt: scalable Deep Reinforcement Learning of Vision-based Robotic Manipulation Skills". Kalashnikov, Irpan, Pastor
Recurrent models of visual attention:
Monte Carlo Tree Search:
- Browne, Powley, Whitehouse, Lucas, Cowling, Rohlfshagen, Tavener, Perez, Samothrakis, Colton. (2012). A Survey of Monte Carlo Tree Search Methods
Blackjack! Also talked about in the David Silver lectures and chapter 5 of Sutton and Barto https://www.davidsilver.uk/wp-content/uploads/2020/03/Easy21-Johannes.pdf