Skip to content

Latest commit

 

History

History
7 lines (7 loc) · 378 Bytes

File metadata and controls

7 lines (7 loc) · 378 Bytes

Exercise 05

In this exercise we will revisit the included racetrack_environment to have a look at temporal difference (TD) algorithms.

Tasks:

  1. policy evaluation using TD learning
  2. on-policy epsilon-greedy control using TD learning
  3. off-policy epsilon-greedy control using TD learning → Q-learning
  4. using double Q-learning in stochastic environments