-
Implementing custom rollout strategies for JuliaPOMDPHello all, I've been trying to wrap my head around the JuliaPOMDP packages for a while now and I need some help and/or advice. Please do keep in mind that I've only just started using Julia and this is likely the cause of most, if not all, of my frustrations. GoalAt the moment I only want to create a rollout strategy for BasicPOMCP where multiple random rollout are applied and the average reward of all the rollouts are returned. FindingsI first looked at the main file. Here I found that you can set the estimation value of the POMCPSolver using the I'm guessing the RolloutEstimator gets translated into a SolvedRolloutEstimator and then This brings me to the
If anyone has any advice on how to do this or can tell me if I got anything wrong, please let me know. I know this is messy but it's the best I could do with my current understanding. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Thanks for asking! Most languages these days are very object oriented. Julia allows for object-oriented programming, but it is often easier to do things with a more functional style. The easiest way to accomplish your initial task of making a Monte Carlo estimate is by creating a function: function mc_estimate(pomdp, s, h, steps)
sim = RolloutSimulator(max_steps=steps)
policy = RandomPolicy(pomdp)
return mean(simulate(sim, pomdp, policy) for i in 1:10)
end
solver = POMCPSolver(
estimate_value = mc_estimate
#...
) Now, you have complete control of anything that happens within (All of your reasoning about RolloutEstimator is correct, but I wouldn't recommend emulating that design pattern - it is from our early days) |
Beta Was this translation helpful? Give feedback.
Hi @loslapleo ,
Sorry for the delayed reply - I am just coming up for air after a very busy semester.
If my understanding is correct, what Ross et al. are referring to as "belief nodes" are nodes in the search tree, which would not include beliefs encountered on the rollout. To count these, you would do something like: