Using Deep Q-Learning, the objective here is to train an agent to navigate through a maze. In this implementation, I have created agents that can handle discrete or continuous actions.
Possible models the agent can use:
- Deep Q-Learning
- Deep Q-Learning with a Target Network
- Double Deep Q-Learning
Actions:
- For discrete actions, the agent can only consider up, down, left right. Each action moves by a fixed stride.
- For continuous actions, the agent can only consider what angle it wants to take. The mean angle is sampled using the cross entropy method.
Tools:
- Greedy Policy Tool that allows the user to visualise the current greedy policy that the agent has learnt. Images of the tool can be seen above.
- Action Visual Tool which discretises the environment and shows the agents preferred order of discrete actions at any grid point. The actions the agent would take are strong yellow, with strong blue being the least preferred action that the agent would consider taking.
In the future, I would like to introduce:
- Policy-based methods.