Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems dqn can't learn much #2

Open
Seraphli opened this issue Sep 6, 2017 · 15 comments
Open

It seems dqn can't learn much #2

Seraphli opened this issue Sep 6, 2017 · 15 comments

Comments

@Seraphli
Copy link

Seraphli commented Sep 6, 2017

I ran the script last night. It started with ~11 mean reward, and ended with ~15.5 mean reward.
I tried to play this mini-game myself, and I could get ~100 score or more.
Deepmind reached ~100 score in their video.
Begin
End

@vors
Copy link

vors commented Sep 10, 2017

Kind of got a similar experience, but it actually dropped from 10 to 5 :D
Here is what the net learned on my laptop, marines are mostly hanging out at the bottom of the screen
marines

@chris-chris
Copy link
Owner

Yeah, guys. I'm trying to enhance the score using the A3C algorithm.
I'm re-writing the example codes.

If you have any improvement, please let me know! :)

@chris-chris
Copy link
Owner

I'm applying the A3C algorithm on it. This is the baseline agent of the paper.
https://deepmind.com/documents/110/sc2le.pdf

@vors
Copy link

vors commented Sep 11, 2017

Awesome! Will try it soon

@ShadowDancer
Copy link

@chris-chris How's going with A3C? I see you changed the principle, is it getting any better?

@chris-chris
Copy link
Owner

@Seraphli @ShadowDancer @vors @yilei

I applied A2C algorithm. I think it works better.
you can train it with commands below.

python train_mineral_shards.py --algorithm=a2c --num_agents=2 --num_scripts=2 --timesteps=2000000

@Seraphli
Copy link
Author

Seraphli commented Nov 4, 2017

I tried to run the code, and at some point the program threw out this error.

Traceback (most recent call last):
  File "train_mineral_shards.py", line 304, in <module>
    main()
  File "train_mineral_shards.py", line 183, in main
    callback=a2c_callback)
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 748, in learn
    obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 621, in run
    self.update_obs(obs)
  File "/home/seraphli/Github/pysc2-examples/a2c/a2c.py", line 297, in update_obs
    marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'

@davidkuhta
Copy link

I'm having the same KeyError: '1' issue as @Seraphli (output near identical to the above). Any idea where to look?

@chris-chris
Copy link
Owner

@davidkuhta @Seraphli I'll fix it! thanks!!

@chris-chris
Copy link
Owner

@davidkuhta @Seraphli I fixed it. Can you guys check it out?

@davidkuhta
Copy link

Thanks @chris-chris! Running it now, will follow-up

@davidkuhta
Copy link

davidkuhta commented Nov 18, 2017

Ok, still ran into the same issue, I did see the initialization in the last commit:
self.xy_per_marine = [{"0":[0,0], "1":[0,0]} for _ in range(nenv)]
I'm re-ran having added a print statement at 296 to output the self.xy_per_marine[env_num] dict.

Here's how it ended:

...
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
rewards :  [0 0 0 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 129.0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'1': [18, 9], '0': [1, 3]}
{'1': [6, 15], '0': [25, 5]}
{'1': [5, 15], '0': [2, 9]}
{'1': [13, 10], '0': [6, 1]}
{'1': [27, 19], '0': [6, 5]}
{'1': [20, 17], '0': [6, 16]}
Game has started.
init group list
env 2 done! reward : 130.0 mean_100ep_reward : 84.7 
rewards :  [0 0 1 0 0 0 0 0]
self.total_reward : [90.0, 87.0, 0, 92.0, 0.0, 0.0, 0.0, 0.0]
{'1': [15, 15], '0': [9, 12]}
{'1': [18, 15], '0': [28, 3]}
{'0': [11, 11]}
Traceback (most recent call last):
  File "train_mineral_shards.py", line 302, in <module>
    main()
  File "train_mineral_shards.py", line 181, in main
    callback=a2c_callback)
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 749, in learn
    obs, states, rewards, masks, actions, xy0, xy1, values = runner.run()
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 622, in run
    self.update_obs(obs)
  File "/home/AI/pysc2-examples/a2c/a2c.py", line 298, in update_obs
    marine1 = self.xy_per_marine[env_num]["1"]
KeyError: '1'

@simonmeister
Copy link

Just in case anyone would like to look at a alternative work-in-progress implementation without openai-baselines dependency and complete action space: https://github.com/simonmeister/pysc2-rl-agents.

@mushroom1116
Copy link

Is there anyone who has encountered this error?

TypeError: Can't instantiate abstract class SubprocVecEnv with abstract methods step_async, step_wait

@soneo1127
Copy link

soneo1127 commented Mar 4, 2018

@mushroom1116 I change the pysc2-examples/common/vec_env/subproc_vec_env.py

from baselines.common.vec_env import VecEnv

to

from . import VecEnv

and can run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants