Parameters for Breakout #125

MatheusMRFM · 2017-09-21T14:22:35Z

Hello there!

I tried to implement my own version of the A3C using tensorflow (here), but ended up not getting good results. Thus, I used the same network architecture as this implementation (universe starter agent) to see if it would change the results. Initially, I thought that the default convolutional layers from tensorflow (tensorflow.contrib.layers) was the responsible. I then used the same convolution function used here, but to no avail....I have already checked the flow of my code and compared it to universe starter agent, and found them to be the same.

The environment that is giving me problems is Breakout. For Pong for example, my code (with the current parameters) works very well. But when I try it with the Breakout, I can't get past the score of 40...I have already tried several parameters (different network architecture, learning rates, frame skipping), but still no success. Has anyone tried this code for Breakout? What parameters did you use? Since I have limited computational power, it is hard for me to make several tests, which forced me to post this question.

Thank you all!

AdamStelmaszczyk · 2018-01-19T20:34:34Z

Has anyone tried this code for Breakout? What parameters did you use?

I used the default parameters from the code, gamma 0.99, lambda 1.0, learning rate 1e-4, gradient clip 40.

I also can't reproduce the results for BreakoutDeterministic-v4 nor SeaquestDeterministic-v4.

Breakout after 24h with 16 workers (3 independent trainings):

However: #87 (comment)

Each worker requires 2-3 cores.

The machine had 24 cores. So, I did run it again for 14h with 8 workers (3 independent trainings):

And A3C (16 workers) from https://arxiv.org/pdf/1602.01783.pdf (page 5) looks better:

AdamStelmaszczyk · 2018-01-19T22:39:56Z

I realized something, in the above A3C paper:

Specifically, we tuned hyperparameters (learning rate and
amount of gradient norm clipping) using a search on six
Atari games (Beamrider, Breakout, Pong, Q*bert, Seaquest
and Space Invaders) and then fixed all hyperparameters for
all 57 games.

Unfortunately, it seems that it's not written what the learning rate and gradient norm clipping values were. But they could be different than the default ones used in the code here.

choinker · 2018-02-11T00:58:45Z

Has anyone found optimal hyperparameters?

AdamStelmaszczyk · 2018-02-20T17:37:46Z

I haven't (tried a bit), but this is helpful:

I guess universe-starter-agent has correct implementation of A3C but definitely with quite a few design changes, e.g., unshared optimizer across workers, different hyper-parameters like input size, learning rate etc., and different network architectures etc. I first "tuned" it to make sure I can reproduce ATARI results to some extent (note: it's quite hard to replicate original paper results because they use Torch and initialization was different -- training is sensitive). I could reach close to the results for "breakout" and few other games in "non-shared optimizer scenario" (see original A3C paper supplementary) but did not get exactly same numbers because of difference in initialization, Tensorflow vs. Torch etc. By the word "tuning" above I meant: changing architecture, changing loss equation to mean loss and not the total loss, changing hyper-parameters etc.

Here are the hyperparams for the original A3C work, but for universe-starter-agent they would be different, because of the significant implementation differences.

Seems possible to find "working" ones for universe-starter-agent, but it requires good effort.

By the way, be aware that universe and universe-starter-agent seem deprecated: openai/universe#218.

AdamStelmaszczyk mentioned this issue Feb 17, 2018

Require HEAD version of universe pathak22/noreward-rl#22

Open

gdb closed this as completed Apr 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameters for Breakout #125

Parameters for Breakout #125

MatheusMRFM commented Sep 21, 2017

AdamStelmaszczyk commented Jan 19, 2018 •

edited

Loading

AdamStelmaszczyk commented Jan 19, 2018 •

edited

Loading

choinker commented Feb 11, 2018

AdamStelmaszczyk commented Feb 20, 2018 •

edited

Loading

Parameters for Breakout #125

Parameters for Breakout #125

Comments

MatheusMRFM commented Sep 21, 2017

AdamStelmaszczyk commented Jan 19, 2018 • edited Loading

AdamStelmaszczyk commented Jan 19, 2018 • edited Loading

choinker commented Feb 11, 2018

AdamStelmaszczyk commented Feb 20, 2018 • edited Loading

AdamStelmaszczyk commented Jan 19, 2018 •

edited

Loading

AdamStelmaszczyk commented Jan 19, 2018 •

edited

Loading

AdamStelmaszczyk commented Feb 20, 2018 •

edited

Loading