You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am attempting to run the function "main_linalg()" in policy_iteration.py but the program fails to terminate.
The iterative policy evaluation with the standard policy iteration program returns the correct policy/
After some investigation, I found that if you replace
u = return_policy_evaluation_linalg(p, r, T, gamma)
with
u = return_policy_evaluation(p, u, r, T, gamma)
in the function called main_linalg
What this does is that it changes the implementation to a modified policy iteration algorithm that uses iterative policy evaluation.
The changes cause the program to terminate after 4 to 5 iterations.
However, the program returns a different policy than the expected.
I did these changes because my initial thought was that the linear and iterative approaches were supposed to return the same utility values for each state. Do you know if this is truly the case?
Although you use padding in your transitional matrix generator to account for boundary collisions, I suspect the linear algebra approach fails to detect wall boundary collisions which causes the optimal action to switch between it and an action that causes a wall collision.
I am not sure how to proceed. Please look into this for a possible fix. Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
I am attempting to run the function "main_linalg()" in policy_iteration.py but the program fails to terminate.
The iterative policy evaluation with the standard policy iteration program returns the correct policy/
After some investigation, I found that if you replace
u = return_policy_evaluation_linalg(p, r, T, gamma)
with
u = return_policy_evaluation(p, u, r, T, gamma)
in the function called main_linalg
What this does is that it changes the implementation to a modified policy iteration algorithm that uses iterative policy evaluation.
The changes cause the program to terminate after 4 to 5 iterations.
However, the program returns a different policy than the expected.
I did these changes because my initial thought was that the linear and iterative approaches were supposed to return the same utility values for each state. Do you know if this is truly the case?
I found another Github https://github.com/SparkShen02/MDP-with-Value-Iteration-and-Policy-Iteration
that implements the modified policy iteration algorithm that uses iterative policy evaluation.
Although you use padding in your transitional matrix generator to account for boundary collisions, I suspect the linear algebra approach fails to detect wall boundary collisions which causes the optimal action to switch between it and an action that causes a wall collision.
I am not sure how to proceed. Please look into this for a possible fix. Thank you.
The text was updated successfully, but these errors were encountered: