time out bootstrapping possible bug. #20

HyunyoungJung · 2023-12-24T23:28:16Z

Hi, thank you for sharing this amazing code.

Recently, I've been looking into the detailed implementation of the code in relation to the paper "Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning."

From my understanding, based on the paper, the reward function is bootstrapped in the event of a time-out. I believe this bootstrapping should apply to the subsequent state, following the formula:
$r_{new} = r + v(s')$,
$s'$ represents the state resulting from the current step.
However, in the current implementation of the code, it appears to be executed as:
$r_{new} = r + v(s)$,
where $s$ is the state used for the current step.

Could you please clarify if my understanding aligns with the intended design?
I am curious to know whether this implementation choice was deliberate for specific reasons or if it might be an oversight.

Thank you for your time and assistance.

mohakbhardwaj · 2024-11-01T04:41:11Z

Hi, I second this issue and have posted a related one: #43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time out bootstrapping possible bug. #20

time out bootstrapping possible bug. #20

HyunyoungJung commented Dec 24, 2023

mohakbhardwaj commented Nov 1, 2024 •

edited

Loading

time out bootstrapping possible bug. #20

time out bootstrapping possible bug. #20

Comments

HyunyoungJung commented Dec 24, 2023

mohakbhardwaj commented Nov 1, 2024 • edited Loading

mohakbhardwaj commented Nov 1, 2024 •

edited

Loading