It seems dqn can't learn much #2

Seraphli · 2017-09-06T01:49:58Z

I ran the script last night. It started with ~11 mean reward, and ended with ~15.5 mean reward.
I tried to play this mini-game myself, and I could get ~100 score or more.
Deepmind reached ~100 score in their video.

vors · 2017-09-10T07:58:11Z

Kind of got a similar experience, but it actually dropped from 10 to 5 :D
Here is what the net learned on my laptop, marines are mostly hanging out at the bottom of the screen

chris-chris · 2017-09-11T05:20:47Z

Yeah, guys. I'm trying to enhance the score using the A3C algorithm.
I'm re-writing the example codes.

If you have any improvement, please let me know! :)

chris-chris · 2017-09-11T13:47:05Z

I'm applying the A3C algorithm on it. This is the baseline agent of the paper.
https://deepmind.com/documents/110/sc2le.pdf

vors · 2017-09-11T16:20:30Z

Awesome! Will try it soon

ShadowDancer · 2017-10-26T23:38:50Z

@chris-chris How's going with A3C? I see you changed the principle, is it getting any better?

chris-chris · 2017-11-04T00:19:47Z

@Seraphli @ShadowDancer @vors @yilei

I applied A2C algorithm. I think it works better.
you can train it with commands below.

python train_mineral_shards.py --algorithm=a2c --num_agents=2 --num_scripts=2 --timesteps=2000000

Seraphli · 2017-11-04T10:48:26Z

I tried to run the code, and at some point the program threw out this error.

Traceback (most recent call last):
  File "train_mineral_shards.py", line 304, in <module>
    main()
  File "train_mineral_shards.py", line 183, in main
    callback=a2c_callback)
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 748, in learn
    obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 621, in run
    self.update_obs(obs)
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 297, in update_obs
    marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'

davidkuhta · 2017-11-16T22:15:43Z

I'm having the same KeyError: '1' issue as @Seraphli (output near identical to the above). Any idea where to look?

chris-chris · 2017-11-17T05:09:56Z

@davidkuhta @Seraphli I'll fix it! thanks!!

chris-chris · 2017-11-18T02:14:57Z

@davidkuhta @Seraphli I fixed it. Can you guys check it out?

davidkuhta · 2017-11-18T03:21:07Z

Thanks @chris-chris! Running it now, will follow-up

davidkuhta · 2017-11-18T17:13:06Z

Ok, still ran into the same issue, I did see the initialization in the last commit:
self.xy_per_marine = [{"0":[0,0], "1":[0,0]} for _ in range(nenv)]
I'm re-ran having added a print statement at 296 to output the self.xy_per_marine[env_num] dict.

Here's how it ended:

...
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
rewards :  [0 0 0 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
Game has started.
init group list
env 2 done! reward : 130.0 mean_100ep_reward : 84.7 
rewards :  [0 0 1 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'0': [11, 11]}
Traceback (most recent call last):
  File "train_mineral_shards.py", line 302, in <module>
    main()
  File "train_mineral_shards.py", line 181, in main
    callback=a2c_callback)
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 749, in learn
    obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 622, in run
    self.update_obs(obs)
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 298, in update_obs
    marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'

simonmeister · 2018-01-06T15:56:58Z

Just in case anyone would like to look at a alternative work-in-progress implementation without openai-baselines dependency and complete action space: https://github.com/simonmeister/pysc2-rl-agents.

mushroom1116 · 2018-01-31T10:00:00Z

Is there anyone who has encountered this error?

TypeError: Can't instantiate abstract class SubprocVecEnv with abstract methods step_async, step_wait

soneo1127 · 2018-03-04T15:12:53Z

@mushroom1116 I change the pysc2-examples/common/vec_env/subproc_vec_env.py

from baselines.common.vec_env import VecEnv

to

from . import VecEnv

and can run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It seems dqn can't learn much #2

It seems dqn can't learn much #2

Seraphli commented Sep 6, 2017 •

edited

Loading

vors commented Sep 10, 2017

chris-chris commented Sep 11, 2017

chris-chris commented Sep 11, 2017

vors commented Sep 11, 2017

ShadowDancer commented Oct 26, 2017

chris-chris commented Nov 4, 2017

Seraphli commented Nov 4, 2017

davidkuhta commented Nov 16, 2017

chris-chris commented Nov 17, 2017

chris-chris commented Nov 18, 2017

davidkuhta commented Nov 18, 2017

davidkuhta commented Nov 18, 2017 •

edited

Loading

simonmeister commented Jan 6, 2018

mushroom1116 commented Jan 31, 2018

soneo1127 commented Mar 4, 2018 •

edited

Loading

It seems dqn can't learn much #2

It seems dqn can't learn much #2

Comments

Seraphli commented Sep 6, 2017 • edited Loading

vors commented Sep 10, 2017

chris-chris commented Sep 11, 2017

chris-chris commented Sep 11, 2017

vors commented Sep 11, 2017

ShadowDancer commented Oct 26, 2017

chris-chris commented Nov 4, 2017

Seraphli commented Nov 4, 2017

davidkuhta commented Nov 16, 2017

chris-chris commented Nov 17, 2017

chris-chris commented Nov 18, 2017

davidkuhta commented Nov 18, 2017

davidkuhta commented Nov 18, 2017 • edited Loading

simonmeister commented Jan 6, 2018

mushroom1116 commented Jan 31, 2018

soneo1127 commented Mar 4, 2018 • edited Loading

Seraphli commented Sep 6, 2017 •

edited

Loading

davidkuhta commented Nov 18, 2017 •

edited

Loading

soneo1127 commented Mar 4, 2018 •

edited

Loading