Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time out bootstrapping possible bug. #20

Open
HyunyoungJung opened this issue Dec 24, 2023 · 1 comment
Open

time out bootstrapping possible bug. #20

HyunyoungJung opened this issue Dec 24, 2023 · 1 comment

Comments

@HyunyoungJung
Copy link

Hi, thank you for sharing this amazing code.

Recently, I've been looking into the detailed implementation of the code in relation to the paper "Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning."

From my understanding, based on the paper, the reward function is bootstrapped in the event of a time-out. I believe this bootstrapping should apply to the subsequent state, following the formula:
$r_{new} = r + v(s')$,
$s'$ represents the state resulting from the current step.
However, in the current implementation of the code, it appears to be executed as:
$r_{new} = r + v(s)$,
where $s$ is the state used for the current step.

Could you please clarify if my understanding aligns with the intended design?
I am curious to know whether this implementation choice was deliberate for specific reasons or if it might be an oversight.

Thank you for your time and assistance.

@mohakbhardwaj
Copy link

mohakbhardwaj commented Nov 1, 2024

Hi, I second this issue and have posted a related one: #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants