- MDPs with pymdptoolbox
- TD(λ) learning rate properties
- Widrow-Hoff rule, aka delta rule, the ADALINE, or LMS filter
- TD(λ) for bounded random walk
- Algorithmic Analysis for Value and Policy Iteration
- Natural Evolution Strategies
- Bandit based Monte-Carlo planning
- Vectorized epsilon-greedy bandits
- Parallel epsilon-greedy bandits
- Knows What It Knows (KWIK)
- Policy Iteration
- Q-Learning
- Open AI Gym Taxi-v2
- Deep Q-Learning
- Double Q-Learning
- Dueling Q-Learning
- Experience/Prioritized/Hindsight Replay
- ...