Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part.1 Modified Policy Iteration with Simplified Bellman Equation and Linear Algebra Policy Evaluation Infinite Loop #20

Open
CesarAndresRojas opened this issue Mar 16, 2022 · 1 comment

Comments

@CesarAndresRojas
Copy link

CesarAndresRojas commented Mar 16, 2022

Hello,

I am attempting to run the function "main_linalg()" in policy_iteration.py but the program fails to terminate.

The iterative policy evaluation with the standard policy iteration program returns the correct policy/

Screen Shot 2022-03-16 at 1 38 34 PM

After some investigation, I found that if you replace

u = return_policy_evaluation_linalg(p, r, T, gamma)

with

u = return_policy_evaluation(p, u, r, T, gamma)

in the function called main_linalg

What this does is that it changes the implementation to a modified policy iteration algorithm that uses iterative policy evaluation.
The changes cause the program to terminate after 4 to 5 iterations.
However, the program returns a different policy than the expected.

Screen Shot 2022-03-16 at 1 39 03 PM

I did these changes because my initial thought was that the linear and iterative approaches were supposed to return the same utility values for each state. Do you know if this is truly the case?

I found another Github https://github.com/SparkShen02/MDP-with-Value-Iteration-and-Policy-Iteration
that implements the modified policy iteration algorithm that uses iterative policy evaluation.

Screen Shot 2022-03-16 at 1 51 42 PM

Although you use padding in your transitional matrix generator to account for boundary collisions, I suspect the linear algebra approach fails to detect wall boundary collisions which causes the optimal action to switch between it and an action that causes a wall collision.

I am not sure how to proceed. Please look into this for a possible fix. Thank you.

@mpatacchiola
Copy link
Owner

Thank you for pointing this out. I will need to look at it when I have some time. I will keep the issue open until solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants