Possible bug: state visitation frequency #1

magnusja · 2017-11-24T00:31:38Z

Hey there,

I am not a 100% sure but I feel like there is something wrong with calculating the state visitation frequency (https://github.com/stormmax/irl-imitation/blob/master/deep_maxent_irl.py#L93).

You iterate over all the states and calculate the frequency for every timestep then.

for s in range(N_STATES):
    for t in range(T-1):
      if deterministic:
        mu[s, t+1] = sum([mu[pre_s, t]*P_a[pre_s, s, int(policy[pre_s])] for pre_s in range(N_STATES)])
      else:
mu[s, t+1] = sum([sum([mu[pre_s, t]*P_a[pre_s, s, a1]*policy[pre_s, a1] for a1 in range(N_ACTIONS)]) for pre_s in range(N_STATES)])

In my opinion the loops should be switched:

for t in range(T-1):
    for s in range(N_STATES):
      if deterministic:
        mu[s, t+1] = sum([mu[pre_s, t]*P_a[pre_s, s, int(policy[pre_s])] for pre_s in range(N_STATES)])
      else:
mu[s, t+1] = sum([sum([mu[pre_s, t]*P_a[pre_s, s, a1]*policy[pre_s, a1] for a1 in range(N_ACTIONS)]) for pre_s in range(N_STATES)])

Because the visitation frequency of timestep t+1 depends on all the state frequencies of timestamp t. This also reflects the formular from the original MaxEnt paper (Ziebart et al, 2008):

Unfortunately if I change the loop heads, the reward is not recovered correctly anymore. Do you have any hints on this?

The text was updated successfully, but these errors were encountered:

Zhousiyuhit · 2019-09-18T02:45:01Z

Hello, I have encountered the same question as you. Have you solved it?

magnusja · 2019-09-18T07:11:16Z

Hello there,

please refer to my fork of this repository, which not only fixes that but also implements highly efficient methods for calculating the state visitation frequency, in tf but also vectorized using numpy. The code in this repository is completely unusable when you need more states than the 5 by 5 example grid ;D

The trick to fix the bug is essentially to take the average over timestamps. This is not mentioned anywhere except this video: https://youtu.be/d9DlQSJQAoI?t=973 (watch for a minute or so then Chelsea mentions that the calculation is missing an average).

See this note of mine as well:
https://github.com/magnusja/irl-imitation/blob/master/deep_maxent_irl.py#L340-L348

Let me know if you have further questions.

Zhousiyuhit · 2019-09-19T01:37:22Z

Thanks very much~

…

在 2019年9月18日，15:11，Magnus ***@***.***> 写道： Hello there, please refer to my fork of this repository, which not only fixes that but also implements highly efficient methods for calculating the state visitation frequency, in tf but also vectorized with numpy. The code in this repository is completely unusable when you need more states than the 5 by 5 example grid ;D The trick to fix the bug is essentially to take the average over timestamps. This is not mentioned anywhere except this video: https://youtu.be/d9DlQSJQAoI?t=973 <https://youtu.be/d9DlQSJQAoI?t=973> (watch for a minute or so then Chelsea mentions that the calculation actually is missing an average). See this note of mine as well: https://github.com/magnusja/irl-imitation/blob/master/deep_maxent_irl.py#L340-L348 <https://github.com/magnusja/irl-imitation/blob/master/deep_maxent_irl.py#L340-L348> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=AF3E3TV3COOJBSU4V2VZF6TQKHIBNA5CNFSM4EFFBU22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD67BUPA#issuecomment-532552252>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AF3E3TTE2JD5YWNLO6MYPRLQKHIBNANCNFSM4EFFBU2Q>.

Zhousiyuhit · 2019-09-19T01:39:59Z

I modified the code based on tensorflow 2.0, and now there are no other problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug: state visitation frequency #1

Possible bug: state visitation frequency #1

magnusja commented Nov 24, 2017 •

edited

Loading

Zhousiyuhit commented Sep 18, 2019

magnusja commented Sep 18, 2019 •

edited

Loading

Zhousiyuhit commented Sep 19, 2019 via email

Zhousiyuhit commented Sep 19, 2019

Possible bug: state visitation frequency #1

Possible bug: state visitation frequency #1

Comments

magnusja commented Nov 24, 2017 • edited Loading

Zhousiyuhit commented Sep 18, 2019

magnusja commented Sep 18, 2019 • edited Loading

Zhousiyuhit commented Sep 19, 2019 via email

Zhousiyuhit commented Sep 19, 2019

magnusja commented Nov 24, 2017 •

edited

Loading

magnusja commented Sep 18, 2019 •

edited

Loading