Skip to content
This repository has been archived by the owner on Apr 7, 2018. It is now read-only.

Parameters for Breakout #125

Closed
MatheusMRFM opened this issue Sep 21, 2017 · 4 comments
Closed

Parameters for Breakout #125

MatheusMRFM opened this issue Sep 21, 2017 · 4 comments

Comments

@MatheusMRFM
Copy link

Hello there!

I tried to implement my own version of the A3C using tensorflow (here), but ended up not getting good results. Thus, I used the same network architecture as this implementation (universe starter agent) to see if it would change the results. Initially, I thought that the default convolutional layers from tensorflow (tensorflow.contrib.layers) was the responsible. I then used the same convolution function used here, but to no avail....I have already checked the flow of my code and compared it to universe starter agent, and found them to be the same.

The environment that is giving me problems is Breakout. For Pong for example, my code (with the current parameters) works very well. But when I try it with the Breakout, I can't get past the score of 40...I have already tried several parameters (different network architecture, learning rates, frame skipping), but still no success. Has anyone tried this code for Breakout? What parameters did you use? Since I have limited computational power, it is hard for me to make several tests, which forced me to post this question.

Thank you all!

@AdamStelmaszczyk
Copy link

AdamStelmaszczyk commented Jan 19, 2018

Has anyone tried this code for Breakout? What parameters did you use?

I used the default parameters from the code, gamma 0.99, lambda 1.0, learning rate 1e-4, gradient clip 40.

I also can't reproduce the results for BreakoutDeterministic-v4 nor SeaquestDeterministic-v4.

Breakout after 24h with 16 workers (3 independent trainings):
breakout-1

breakout-2

breakout-3

However: #87 (comment)

Each worker requires 2-3 cores.

The machine had 24 cores. So, I did run it again for 14h with 8 workers (3 independent trainings):

breakout-1

breakout-2

breakout-3

And A3C (16 workers) from https://arxiv.org/pdf/1602.01783.pdf (page 5) looks better:

goal

@AdamStelmaszczyk
Copy link

AdamStelmaszczyk commented Jan 19, 2018

I realized something, in the above A3C paper:

Specifically, we tuned hyperparameters (learning rate and
amount of gradient norm clipping) using a search on six
Atari games (Beamrider, Breakout, Pong, Q*bert, Seaquest
and Space Invaders) and then fixed all hyperparameters for
all 57 games.

Unfortunately, it seems that it's not written what the learning rate and gradient norm clipping values were. But they could be different than the default ones used in the code here.

@choinker
Copy link

Has anyone found optimal hyperparameters?

@AdamStelmaszczyk
Copy link

AdamStelmaszczyk commented Feb 20, 2018

I haven't (tried a bit), but this is helpful:

I guess universe-starter-agent has correct implementation of A3C but definitely with quite a few design changes, e.g., unshared optimizer across workers, different hyper-parameters like input size, learning rate etc., and different network architectures etc. I first "tuned" it to make sure I can reproduce ATARI results to some extent (note: it's quite hard to replicate original paper results because they use Torch and initialization was different -- training is sensitive). I could reach close to the results for "breakout" and few other games in "non-shared optimizer scenario" (see original A3C paper supplementary) but did not get exactly same numbers because of difference in initialization, Tensorflow vs. Torch etc. By the word "tuning" above I meant: changing architecture, changing loss equation to mean loss and not the total loss, changing hyper-parameters etc.

Here are the hyperparams for the original A3C work, but for universe-starter-agent they would be different, because of the significant implementation differences.

Seems possible to find "working" ones for universe-starter-agent, but it requires good effort.

By the way, be aware that universe and universe-starter-agent seem deprecated: openai/universe#218.

@gdb gdb closed this as completed Apr 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants