Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGQ: Combining policy gradient and Q-learning #5

Open
sotetsuk opened this issue Apr 7, 2017 · 2 comments
Open

PGQ: Combining policy gradient and Q-learning #5

sotetsuk opened this issue Apr 7, 2017 · 2 comments
Assignees

Comments

@sotetsuk
Copy link
Contributor

sotetsuk commented Apr 7, 2017

https://arxiv.org/abs/1611.01626

@sotetsuk sotetsuk self-assigned this Apr 7, 2017
@sotetsuk
Copy link
Contributor Author

sotetsuk commented Apr 7, 2017

議論・疑問・コメント

  • なんで方策勾配の停留点とほど遠い点でも、停留点付近での関係式からQチルダを作ってそれにベルマン最適で正則化かけて更新して良いのか良くわからなかった
  • 方策勾配法はナイーブな定式化では探索をすることができずに方策が決定論的になりがちだが、探索を促すエントロピー正則化を使った方策勾配法がある意味でより自然な定式化かもしれない、という示唆とも捉えることができて面白い。
  • Eq.4からπとVだけを使って(妥当な)Qを計算しているのがPGQのポイントだと思った。

他に読むべき文献

@sotetsuk sotetsuk mentioned this issue Apr 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant