forked from erachelson/RLclass_MVA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
TODO
22 lines (19 loc) · 765 Bytes
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
- Decide whether to change \pi_{n+1} \in \mathcal{G}Q_{n} and Q_{n+1} = T^{\pi_{n+1}} Q_n in VI, PI, MPI (RL2), learning optimal vf, SARSA (RL3), reminder, neural networks for AVI (RL4), DPG (RL5)
- Exo prioritized sweeping (RL2)
- Directly implement target networks in DQN (RL4)
- Exo double DQN (RL4)
- Other exercises (RL4)
- Restructure RL5 to make it more progressive.
- Add SAC with adjustable temp to RL5
- SAC discrete actions
- SAC delayed actor updates
- correction of exercises in RL6
- NPG in RL6
proofs:
- existence of a memoryless, stationary, deterministic policy
- contraction of T^\pi
- contraction of T^*
- policy improvement theorem
- convergence of MPI
- DPG theorem
make a version of SAC with LaBER on the critic, and delayed actor updates.