Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculating Reward on Game End #38

Closed
Nostrademous opened this issue Feb 5, 2019 · 5 comments
Closed

Calculating Reward on Game End #38

Nostrademous opened this issue Feb 5, 2019 · 5 comments

Comments

@Nostrademous
Copy link
Collaborator

If you look at the stream of rewards below (entire Game #2) you will see that it ends in victory for Dire, however you only see 1 death each from both agents. Also, based on tower_hp it looks like the tower was not even close to dying, meaning the game ended b/c the Radiant agent died a 2nd time, but I don't have the -3.0 kill reward for Player 0 in the reward a second time.

This make me believe that we don't capture the rewards between the last reward sync and the game end.

2019-02-05 10:59:50,789 INFO     === Starting Game 2.
2019-02-05 10:59:50,789 INFO     Starting game.
2019-02-05 10:59:50,797 INFO     Player 0 using weights version 0
2019-02-05 10:59:50,802 INFO     Player 5 using weights version 0
2019-02-05 11:00:16,411 INFO     Player 0 rollout.
2019-02-05 11:00:16,412 INFO     Player 0 reward sum: -0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.114,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.0}
2019-02-05 11:00:16,429 INFO     Player 5 rollout.
2019-02-05 11:00:16,430 INFO     Player 5 reward sum: 0.11 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.0,
 'hp': 0.0,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': 0.0,
 'xp': 0.114}
2019-02-05 11:00:33,551 INFO     Received new model: version=0, size=1472372b
2019-02-05 11:00:40,146 INFO     Player 0 rollout.
2019-02-05 11:00:40,147 INFO     Player 0 reward sum: -0.15 subrewards:
{'death': -3.0,
 'denies': 0.0,
 'enemy': 3.0988716954415696,
 'hp': -1.2002411301619431,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.015,
 'win': 0.0,
 'xp': 0.9700000000000001}
2019-02-05 11:00:40,158 INFO     Player 5 rollout.
2019-02-05 11:00:40,159 INFO     Player 5 reward sum: 0.15 subrewards:
{'death': -3.0,
 'denies': 0.2,
 'enemy': 3.245241130161943,
 'hp': -1.2005383621082364,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.058333333333333334,
 'win': 0.0,
 'xp': 0.96}
2019-02-05 11:00:56,220 INFO     Player 0 rollout.
2019-02-05 11:00:56,221 INFO     Player 0 reward sum: -6.98 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': -0.61683011154303,
 'hp': -1.3583716176202625,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': 0.0,
 'win': -5.0,
 'xp': 0.0}
2019-02-05 11:00:56,226 INFO     Player 5 rollout.
2019-02-05 11:00:56,227 INFO     Player 5 reward sum: 6.72 subrewards:
{'death': -0.0,
 'denies': 0.0,
 'enemy': 1.3602503028054476,
 'hp': -0.3734285740740741,
 'kills': 0.0,
 'lh': 0.0,
 'tower_hp': -0.018333333333333333,
 'win': 5.0,
 'xp': 0.756}
2019-02-05 11:00:56,232 INFO     Game finished.
@TimZaman
Copy link
Owner

TimZaman commented Feb 7, 2019

Both players died here, during the same rollout. Then one player won at the last rollout. The rewards are aggregated only per-rollout.
It doesn't matter how he won (death or tower), a win might be because of a death or tower, but that's not scored independently - no need to.

@TimZaman TimZaman closed this as completed Feb 7, 2019
@Nostrademous
Copy link
Collaborator Author

But a single player needs to die twice for game to be over. They each died once according to record, so game should not be over. What I believe happened is that a player died a 2nd time and that this info was not captured in our reward aggregation.

@TimZaman
Copy link
Owner

TimZaman commented Feb 7, 2019 via email

@Nostrademous
Copy link
Collaborator Author

but doesn't the bot that died a 2nd time not get the negative reward from the 2nd death? Sure, it "loses" and gets the -5 but it won't necessarily make the connection that 2nd death is the cause as it doesn't see the 2nd death in the rewards.

or am I misunderstanding something about our algo?

@TimZaman
Copy link
Owner

TimZaman commented Feb 7, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants