Replies: 1 comment 5 replies
-
Hey @jmuchovej , this is an intriguing idea. Creating a softmax policy should be fairly straightforward, and would not require any changes to stepthrough to simulate. But, it sounds like your goal is to output something other than the history. Can you describe what the inputs and outputs of the function you are proposing would be? If you want the likelihood of each action under the policy, you could return in from the policy via I am generally happy to either add optional outputs to |
Beta Was this translation helpful? Give feedback.
-
In computational social cognition (an area of cognitive psychology that aims to build computational models of "theory of mind" (essentially, inferring other's beliefs and rewards from action sequences)), we use [PO]MDPs quite frequently. A common process is to compute the likelihood of an action sequence under a given reward function (and/or beliefs iff using POMDPs).
To do this, the general process is to do everything in
stepthrough
, except according a probabilistic variant of the optimal policy (we convert it into a probabilistic policy via softmax with some temperature (τ or β) depending on the "rationality" of the agent). I believe there is a pretty strong assumption that offline planners are used (thus all state/belief-action pairs have Q-estimates), but it could also be that I've just never seen/used online planners for this.I've implemented this pretty often, but it's a fair bit of boilerplate code that I have to repeat, so I think would be cool to have in
POMDPs.jl
.What do you think? (I'm happy to implement this and submit a PR, but figured it would be best to discuss before opening an issue/PR.)
Beta Was this translation helpful? Give feedback.
All reactions