You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tao Bian 의 value iteration (VI) 기반 CT ADP 를 구현한다 [1, 2].
Kleinman 알고리즘을 online+partially model-free 로 가져간 것이 linear IRL 의 시작인 만큼 [3], VI CT ADP [1, 2] 를 online 으로 확장하거나([1] 을 읽을 때는 왜 online 으로 안 썼을까 의문이었는데, [2] 에서 자연스럽게 확장함), Hamiltonian 의 infimum 부분을 대체하는 것도 좋은 연구 방향이라고 생각됩니다.
[1] T. Bian and Z.-P. Jiang, “Value Iteration, Adaptive Dynamic Programming, and Optimal Control of Nonlinear Systems,” in 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, Dec. 2016, pp. 3375–3380. doi: 10.1109/CDC.2016.7798777.
[2] T. Bian and Z.-P. Jiang, “Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach,” IEEE Trans. Neural Netw. Learning Syst., pp. 1–10, 2021, doi: 10.1109/TNNLS.2020.3045087.
[3] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive Optimal Control for Continuous-Time Linear Systems Based on Policy Iteration,” Automatica, vol. 45, no. 2, pp. 477–484, Feb. 2009, doi: 10.1016/j.automatica.2008.08.017.
The text was updated successfully, but these errors were encountered:
Tao Bian 의 value iteration (VI) 기반 CT ADP 를 구현한다 [1, 2].
Kleinman 알고리즘을 online+partially model-free 로 가져간 것이 linear IRL 의 시작인 만큼 [3],
VI CT ADP [1, 2] 를 online 으로 확장하거나([1] 을 읽을 때는 왜 online 으로 안 썼을까 의문이었는데, [2] 에서 자연스럽게 확장함), Hamiltonian 의 infimum 부분을 대체하는 것도 좋은 연구 방향이라고 생각됩니다.참고로 [3] 은 구현됨�
Refs
[1] T. Bian and Z.-P. Jiang, “Value Iteration, Adaptive Dynamic Programming, and Optimal Control of Nonlinear Systems,” in 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, Dec. 2016, pp. 3375–3380. doi: 10.1109/CDC.2016.7798777.
[2] T. Bian and Z.-P. Jiang, “Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach,” IEEE Trans. Neural Netw. Learning Syst., pp. 1–10, 2021, doi: 10.1109/TNNLS.2020.3045087.
[3] D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, “Adaptive Optimal Control for Continuous-Time Linear Systems Based on Policy Iteration,” Automatica, vol. 45, no. 2, pp. 477–484, Feb. 2009, doi: 10.1016/j.automatica.2008.08.017.
The text was updated successfully, but these errors were encountered: