Asynchronous Advantage Actor-Critic 1
In A3C multiple agents asynchronously run in parallel to generate data. This approach provides a more practical alternative to experience replay since parallelization also diversifies and decorrelates the data 2.
there is a global network and many worker agents that each has its own parameters. Each of these agents interacts with its copy of the environment simultaneously as the other agents are interacting with their environments, and updates independently of the execution of other agents when they want to update their shared network
We use a parameter server to hold the global network, following 3
- Implement the evaluation in n-step
- Implement continuous mode
- Implement the config for hyper parameter configuration
- Add explanation of a3c