You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, I've been looking into the detailed implementation of the code in relation to the paper "Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning."
From my understanding, based on the paper, the reward function is bootstrapped in the event of a time-out. I believe this bootstrapping should apply to the subsequent state, following the formula: $r_{new} = r + v(s')$, $s'$ represents the state resulting from the current step.
However, in the current implementation of the code, it appears to be executed as: $r_{new} = r + v(s)$,
where $s$ is the state used for the current step.
Could you please clarify if my understanding aligns with the intended design?
I am curious to know whether this implementation choice was deliberate for specific reasons or if it might be an oversight.
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered:
Hi, thank you for sharing this amazing code.
Recently, I've been looking into the detailed implementation of the code in relation to the paper "Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning."
From my understanding, based on the paper, the reward function is bootstrapped in the event of a time-out. I believe this bootstrapping should apply to the subsequent state, following the formula:
$r_{new} = r + v(s')$ ,
$s'$ represents the state resulting from the current step.
$r_{new} = r + v(s)$ ,$s$ is the state used for the current step.
However, in the current implementation of the code, it appears to be executed as:
where
Could you please clarify if my understanding aligns with the intended design?
I am curious to know whether this implementation choice was deliberate for specific reasons or if it might be an oversight.
Thank you for your time and assistance.
The text was updated successfully, but these errors were encountered: