Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with critic target in PPO #42

Open
davidireland3 opened this issue Feb 12, 2022 · 1 comment
Open

Issue with critic target in PPO #42

davidireland3 opened this issue Feb 12, 2022 · 1 comment

Comments

@davidireland3
Copy link

In the line used to define the returns, we use the GAE + values as the target for the critic to learn. Is this correct?

My intuition says no -- the target we are training towards does not represent the true value function; should the target for value of the current state not be the observed reward + value at the next state?

Thanks!

@Kin9L
Copy link

Kin9L commented Apr 17, 2022

In the line used to define the returns, we use the GAE + values as the target for the critic to learn. Is this correct?

My intuition says no -- the target we are training towards does not represent the true value function; should the target for value of the current state not be the observed reward + value at the next state?

Thanks!

Hi, I just saw your comment. I think it is correct to use the GAE + Values as the target for the critic. Roughly speaking, the GAE is shown below. GAE_t + Value_t can be used as the estimation of Value in time t.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants