AY2020/21 Sem 1/2 CP3209 UROP in Computing Project with Dr Jing Wei, IHPC.
- Get models working for speaker_listener, followed by the rest of the scenarios
- Add discrete action space output option via Gumbel-Softmax reparameterization trick
- Move noise parameter to inside the agent class
- Add support for individual good/bad agent policies
- Implement M3DDPG algorithm
- Implement GIF saving for MPE
- Implement policy estimation and esembling for MADDPG
- Add support for
MultiBoxDiscrete
action space - Add individual agent reward tracking
- Experiment with additional normalization layers
- Experiment with separate actor/critic networks
- Re-factorization of code into packages
- Modify MPE code to provide benchmark statistics
- Document code
- Add ability to set random seed
- Add printout for model and experimental parameters before code execution