-
Notifications
You must be signed in to change notification settings - Fork 0
/
Report
25 lines (12 loc) · 1.46 KB
/
Report
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
This repository solves the Banana Unity ENV with a Deep Q-Network, based on the DQN paper (DQN: Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529.)
Network Architecutre:
A two hidden layers MLP with input size of 37, output size of 4 and two hidden layers of size 64 using an ELU activation function.
Algorithm
The agent receives state from the env => retruns an action generated by an epsilon decyaing greedy policy [the action is generated randomly with probability epsilon, or is generated by the local QNetwork with 1-epsilon probability] => the action is passed to tne ENV and a (state, action, reward, next_state, done) tuple is generated => The experience tuple is saved in the MemoryBuffer and every UPDATE_EVERY steps, the agent invokes the self.learn method => self.learn: the agent get a batch of experiences from the MemoryBuffer, computes the loss as L = mean_squared_error((rewards + qnet_target(next_states), qnet_local(states)) => changes the weights of the local qnetwork with respct to the loss => updates the target qnetwork with qnetwork_target_params = TAU*qnetwork_local_params + (1-TAU)*qnetwork_target_params.
The used params are specified in hparams.py
Results:
The agent solves the environment in 517 episodes and an anerage score of 13.11 over the last 100 episodes.
See graph in policy_learning_curve.png
Ideas for future work:
1. Implement prioritized experience replay
2. Implement Double DQN