-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing done
with truncated
and terminated
#10
Comments
Hi Ashok, This is a somewhat complicated issue. In the older versions of gym, this Line 371 in 79ccf8b
With that in mind, there are a few ways we can proceed with this: Probably option 4 is the best way. Let me know what you think. I of course will welcome pull requests towards this issue |
Thank you for the detailed response. Yeah, the fourth option seems like the straightforward way since splitting branches may lead to confusion for future users. To maintain compatibility, we can pass in an On a sidenote, I am wondering how Line 371 in 79ccf8b
is able to calculate the maximum time that should be decided for the truncated part. Would it make more sense to define the time limit ourselves depending upon the the velocity and the distance between heaven and hell ?
|
That sounds like a good idea, we can use the As for the Line 47 in 79ccf8b
|
Okay sure, I'll send in a PR with the
and what is the basis for the choice of |
In general, we want to give the agents the ability to successfully end an episode while exploring, so it the |
Apologies for the delay. I got sidetracked due to coursework. I am working on this and the |
Hi Ashok, no problem. The gym-gridverse environments have a time limit of Line 66 in bc46144
info dict should be {"TimeLimit.truncated": True} . Let me know if you are still seeing an empty dictionary for info even after 500 timesteps as that would be a bug we'd need to address
|
Hey Kevin, I tried to modify code at two places. First in the
It seems a bit inconsistent because the wrapper should have reflected the correct info dict in 1 and both values shouldn't have been |
Hi Ashok, very sorry I have left you hanging for so long. It may seem surprising, but I believe this is the intended behaviour for the 2 places you put print statements. For the first example, it returns an empty dict and Line 66 in bc46144
We can take a look at the step function of the def step(self, action):
assert self._elapsed_steps is not None, "Cannot call env.step() before calling reset()"
observation, reward, done, info = self.env.step(action)
self._elapsed_steps += 1
if self._elapsed_steps >= self._max_episode_steps:
info['TimeLimit.truncated'] = not done
done = True
return observation, reward, done, info We can see it calls the inner (in this case, This also explains why both are |
Hey Kevin,
I hope you are doing well. I noticed a small bug where the step function returns only
obs, reward, done, info
instead of theobs, reward, terminated, truncated, info
. I came across this article from gymansium that emphasised the need for both terminated and truncated. Can I help in updating the codebase?The text was updated successfully, but these errors were encountered: