layout | permalink | title |
---|---|---|
page |
/mdp/ |
Markov decision process |
Hello, traveler!
This site acts as a markov decision process (MDP), for human agents to pick posts and collect rewards in a typical RL setting. Can you figure out what the optimal policy is?
Hint: You might want to explore the site first and then commit to an answer.
Afterwards, you can check the optimal policy and value function for this site here.
- diagram of automaton (with rewards etc.)
- value function