Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 902 Bytes

mdp.md

File metadata and controls

24 lines (17 loc) · 902 Bytes
layout permalink title
page
/mdp/
Markov decision process

Hello, traveler!
This site acts as a markov decision process (MDP), for human agents to pick posts and collect rewards in a typical RL setting. Can you figure out what the optimal policy is? Hint: You might want to explore the site first and then commit to an answer.

Afterwards, you can check the optimal policy and value function for this site here.

  • diagram of automaton (with rewards etc.)
  • value function

Reset MDP.