-
-
Notifications
You must be signed in to change notification settings - Fork 954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] "CarRacing-v3" Appears to Reset before Completing a Lap causing it to Ignore lap_complete_percent #1269
Comments
@AUnicyclingProgrammer Could you provide a video of that agent doing this? |
Yes I can, I'll make one when I have some extra free-time. |
Hello, I’ve been trying to find a solution to this issue, but before proceeding, I’d like to fully understand the exact conditions that determine the episode’s termination for this specific environment. Specifically, I’d like to clarify:
In particular, I’d also like to understand what happens when the specified percentage of tiles is covered. Is the intended behavior that the episode will always end with terminated = True, even if the episode ends without finishing the lap? Thanks in advance for your help! |
Thank you for looking into this, and for reminding me to come back to this issue. I've been atypically busy for the past few months and had all but forgotten about it. I wasn't able to find any definitive answers to those questions when I ran into this problem, but it appears that the source code is intended to accomplish the following: Truncation:
Termination:
The code only checks for a termination or truncation event if the agent has re-crossed the start/stop line (the only exception being when the agent drives into the abyss). |
Thank you for your response. I will try to resolve this issue in the coming days. I hope to get some feedback from you all, as this is my first time working on this project. |
I've unexpectedly (and rather quickly) found a possible solution to the issue (which does not address the early reset situation). Please note that I've tested this fix only using the interactive version of the environment. What I did was simply move the
So, what will now happen is that if the car reaches all the tiles or at least the specified percentage of discovered tiles and then reaches the first tile, the environment will end with terminated = True (previously, it ended with truncated = True). I would love to hear your feedback on this. While it seems to work, the fact that it doesn't address the early reset issue leaves me in doubt. |
@VincenzoPalma Could you provide a video of the agent behaviour to showcase what the change will cause? |
Here's the video https://youtu.be/0mUQh2MiKbs. You can see the The video features three clips, all recorded with the interactive version of the game and a 10% threshold for the minimum percentage of tiles to visit (for testing purposes):
|
Thanks for the video, @VincenzoPalma. The question about "fixing" the environment is, what should the expected behaviour from the agent be? This is a combination of the reward and termination conditions. To me, the agent (car) should move around the race track as quickly as possible (staying on the tiles) and reaching the end. @AUnicyclingProgrammer or @VincenzoPalma, what is your opinion about this, as you have used the environment more than me and understand the agent behaviour better? |
I believe it makes sense to terminate the environment once a lap is completed, regardless of the number of tiles visited, and this is exactly what my change implements. As you saw in the video, the current behavior in interactive mode prevents the environment from ending after completing a lap unless all tiles have been visited. This applies to interactive mode, but @AUnicyclingProgrammer can provide more insight, as they also tested the environment while training an agent. |
Ok, I'm glad that we agree about that. |
@pseudo-rnd-thoughts What version number should be specified in the documentation (the one in parentheses)? |
Ok, the documentation history in https://github.com/Farama-Foundation/Gymnasium/blame/9ff8bf45dd765fae4cadec57873c69345e61e28e/gymnasium/envs/box2d/car_racing.py#L192 is wrong. @VincenzoPalma Could you update the whole history be correct for the reason the changes were made, from v0 to v4 (the new version with new termination case). |
@pseudo-rnd-thoughts I’ll take care of this as soon as I can. Just to clarify, as I misunderstood your earlier message: when the lap ends, do we want the environment to terminate even if the percentage of visited tiles hasn't been reached? If that’s the case, I’ll need to adjust my code, as it currently doesn't behave that way. I just want to confirm that this is our intention. |
I would want to do some testing to check it has the expected behaviour of the agent to finish the race on the tiles as quickly as possible. |
Describe the bug
The termination and truncation values being returned from the environment don't seem to line up with what I'd expect from the documentation. The environment appears to ignore the value passed to
lap_complete_percent
, as the episode truncation flag is alwaysTrue
unless the agent touches every tile before crossing the 0th tile. It appears that this behavior is occurs because the environment resets a few update cycles before the agent re-crosses the 0th tile if the agent hasn't visited all the tiles in the track.Code example
System info
Installation Method: Installed in a conda environment using pip
Gymnasium Version: 1.0.0
Python Version: 3.11.10
Additional context
Similar Issues and PRs:
Detailed Description
I'm training an agent I created using the Stable Baselines 3 implementation of PPO (more details below). It is currently able to complete most of the track, but the termination and truncation values being returned don't seem to line up with what I'd expect from the documentation. The environment appears to ignore the value passed to
lap_complete_percent
, as the episode truncation flag is alwaysTrue
unless the agent touches every tile before crossing the 0th tile. It appears that this behavior is occurs because the environment resets a few update cycles before the agent re-crosses the 0th tile if the agent hasn't visited all the tiles in the track.According to my current understanding of the code,
car_racing.py
will only check the percentage of lap completion when the vehicle crosses the starting tile, but after some testing I determined that the environment will reset when the car is approximately 2 few tiles before that point AND the car has covered enough tiles to satisfylap_complete_percent
UNLESS the car has covered all tiles during the first lap, in which case it will work as intended.I have been able to consistently replicate this behavior by applying my agent and the interactive version of the environment. (The interactive version behaves a little differently. The environment will not reset until all tiles have been covered because it doesn't enforce a time limit. If I start the race as normally, then go off-road to skip at least one tile, then complete the rest of the lap on the track as normal, the race will not be considered complete when I re-cross the 0th tile. It allows me to continue around the track and will only terminate after I cross the tiles I skipped previously. From what I can tell, when I'm driving under manual control the task never terminates "successfully" as
self.env.new_lap
is never set toTrue
.)I have included a few print statements in my "fixed" version of the code to assist with troubleshooting this issue. I only needed to modify one specific section of code, which can quickly be found by searching for the unique string
~~~
.Agent Details
This is my first time using Gymnasium and Stable Baselines 3 so I've tried to keep things as simple as possible. I am applying the
GrayscaleObservation
wrapper to my environment to reduce the amount of time it takes the model to converge, but that's the only major modification I've made.For my training environment I set
lap_complete_percent = 0.9
.I used these two commands to load and train my agent:
Saved Model: ppo_car_gray_2500000.tar.gz
"Fixed" Code (In Collapsed Section)
I implemented a "quick fix" to verify that the problem is in most likely
car_racing.py
and not the code I've written. It slightly alters the behavior of the environment as the car no longer has to re-cross the 0th tile for the race to be considered complete. Instead it considers the race complete if the car is almost done with a lap (i.e. the index of the current tile is within 1% of the total number of tiles in the track).Expand to View Code
Checklist
The text was updated successfully, but these errors were encountered: