Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom initial belief (e.g., solving many POMDPs with varying beliefs) #20

Open
jmuchovej opened this issue Jul 9, 2024 · 11 comments · May be fixed by #21
Open

Support for custom initial belief (e.g., solving many POMDPs with varying beliefs) #20

jmuchovej opened this issue Jul 9, 2024 · 11 comments · May be fixed by #21

Comments

@jmuchovej
Copy link

Currently, _vectorized_initialstate takes the pomdp and ordered_states then uses the initialstate(::POMDP) to initialize b0; however, SARSOP is able to begin from any point in the belief-space, so it makes sense to allow users to provide an initial belief (perhaps in sparse form?) from which to commence SARSOP.

I'd be happy to write/submit a PR for this, but wanted to open an issue to see if it was something y'all would be willing to accept. 🙂

Changes that I think that would be necessary to support this. I could probably tackle this in the next week or two, if an acceptable PR.
# solver.jl
+ function POMDPTools.solve_info(solver::SARSOPSolver, pomdp::POMDP; b0=initialstate(pomdp))
+     tree = SARSOPTree(solver, pomdp; b0)
- function POMDPTools.solve_info(solver::SARSOPSolver, pomdp::POMDP)
-     tree = SARSOPTree(Solver, pomdp)
    # the rest of the code ...
    return pol, (; ...)
end

+ function POMDPs.solve(solver::SARSOPSolver, pomdp::POMDP; b0=initialstate(pomdp)) =
- function POMDPs.solve(solver::SARSOPSolver, pomdp::POMDP) =
    fist(solve_info(solver, pomdp; b0))

# tree.jl
+ function SARSOPTree(solver, pomdp::POMDP; b0=initialstate(pomdp))
- function SARSOPTree(solver, pomdp::POMDP)
+    sparse_pomdp = ModifiedSparseTabular(pomdp, b0)
-    sparse_pomdp = ModifiedSparseTabular(pomdp)
    # the rest of the codebase ...
    return insert_root!(...)
end

# sparse_tabular.jl
+ function ModifiedSparseTabular(pomdp::POMDP, b0)
- function ModifiedSparseTabular(pomdp::POMDP)
    S = ordered_states(pomdp)
    # the rest of the codebase ...
+     b0 = _vectorized_initialstate(pomdp, S, b0)
-     b0 = _vectorized_initialstate(pomdp, S)
    return ModifiedSparseTabular(T, R, O, terminal, b0, discount(pomdp))
end

+ function _vectorized_initialstate(pomdp, S, b0)
+ function _vectorized_initialstate(pomdp, S)
    b0_vec = Vector{Float64}(undef, length(S))
    @inbounds for i ∈ eachindex(S, b0_vec)
        b0_vec[i] = pdf(b0, S[i])
    end
    return sparse(b0_vec)
end

So long as b0 is guaranteed to be compatible with pdf(b0, S[i]), these are all the changes necessary. Otherwise, there would need to be some handling to ensure that b0 is either compatible with vectorizing or allow folks to specify how to achieve the equivalent of pdf(b0, S[i]).


Perhaps if there were functions in POMDPTools that support this kinda interface, that could be interesting but is way out of scope for this issue/PR. (e.g. going from marginal beliefs to relevant SparseCat and the like.)

@zsunberg
Copy link
Member

@jmuchovej Yes, I agree we should add this! But, we should make it a solver constructor argument rather than a keyword argument for solve. I suggest the name root_belief. @WhiffleFish what do you think?

@WhiffleFish
Copy link
Member

@zsunberg How would you see the solver constructor argument being handled? As far as I understand, solvers are problem-independent. So, it would have to be something like a function, the default for which would be something like root_belief = pomdp -> initialstate(pomdp), or?

@zsunberg
Copy link
Member

Yeah, that seems like a good way to handle it.

@jmuchovej
Copy link
Author

Hmm... is "problem", in SARSOP's case, defined as the reward function only or the (reward function, initial belief) pair?

  • If only the reward function (as it is now), then you could argue that SARSOPSolver is actually belief-dependent (since it defaults to the initialstate).
  • If the pair, then I see two clear routes forward:
    1. Encode the initial belief into the incoming POMDP's struct (have a non-uniform belief? encode it in the MyPOMDP <: POMDP struct, then use it in POMDPs.initialstate(::MyPOMDP), etc.)
    2. Consume the initial belief in solve, much like you do with solve(::SARSOPSolver, ::POMDP, ...)

If the aim is to define an initial belief using the POMDPs interface – I definitely agree this would be ideal! However, it strikes me as potentially odd that either:

  1. The initial belief must be part of MyPOMDP, or...
  2. The initial belief must be non-reproducibly sampled (i.e., it'll either rely on global variables, or worse require reinstantiation anytime the initial belief changes).

FWIW: I'm not tied to the solve approach I outlined! I just figured it was most consistent with the (reward function, initial belief) pair as the "problem".

When I read "solvers are problem-independent"1, that strikes me as: "instantiate the solver once and reuse to your heart's content". So, if I had 200 different initial beliefs and 200 different MyPOMDP definitions, I could use the same SARSOPSolver instance to compute all 40K policies.

Footnotes

  1. This too was my understanding of any Solver!

@zsunberg
Copy link
Member

In POMPDs.jl, the initial belief is part of the problem definition via initialstate. SARSOP uses a tree to decide what parts of the belief space to explore and guarantees that the policy is within epsilon of optimal if started at the belief at the root of this tree. This root belief MAY be the initial belief of the POMDP, but it could be something else. @bkraske has recently talked to me about solving a problem he was working on at other beliefs.

@WhiffleFish 's solution should accommodate all use cases elegantly. root_belief can be a function of the POMDP. So, by default, at the beginning of solve, when SARSOP is introduced to the POMDP, it will call root_belief(pomdp) and get the output of initialstate. Therefore, the same instance of SARSOPSolver can be used for many different POMDPs. But if the user wants to solve with a different root belief, they can override it without needing to mess with the POMDP definition.

@jmuchovej
Copy link
Author

jmuchovej commented Jul 12, 2024

Thanks for clarifying! 🙂 So, I see what you mean, but I don't think root_belief should go in the solver. Adding root_belief to the solver would make SARSOPSolver problem dependent. As it currently is, SARSOPSolver is problem independent. Like I understood and @WhiffleFish also understood – a Solver should be problem independent, no?

If this is the case: perhaps you meant that root_belief should be stored in SARSOPTree? If so, then I totally agree! (This makes sense as otherwise it's unclear where the solver started from.)

If you actually meant that root_belief should be part of the solver...

I agree that root_belief(pomdp) could accommodate many use cases.

When I say "problem dependent", I mean that I couldn't use the same SARSOPSolver instance across 15 different problems, unless they had exactly the same initial beliefs (or beliefs were unreproducibly sampled) – even though I can change the rest of the problem (by creating another instance of MyPOMDP) otherwise.

The only way root_belief(pomdp) could accommodate all use cases is if:

  1. Users refer to non-local variables.
  2. MyPOMDP contains all relevant features for sampling/specifying alternate root beliefs.

e.g., in my case, I have 200 beliefs all of which are affiliated with some Random.seed!. root_belief(pomdp) would require that seed (or similar) is part of the POMDP definition – but there's no need for the seed otherwise in the POMDP.

I'm not sure if (1) is a deal-breaker – personally I think it is, as referencing non-local variables make it hard to reason about code execution. Another possibility (in my case) is to Random.seed! right before SARSOPSolver, but at that point – I'm using a new SARSOPSolver for every belief-reward pair.

For root_belief(pomdp) to truly accommodate all cases, anything relevant to sampling a different root_belief without using non-local variables demands that the relevant information is stored in the POMDP. If someone wants to use an existing POMDP, say RockSamplePOMDP, they would have to engage in (1) or full reimplement it to support (2).

Perhaps there's something I'm overlooking. If so, I'd be happy to hear about it 'cause right now integrating root_belief into SARSOPSolver wouldn't actually resolve what prompted me to open this issue. 😅

Essentially, I was looking for a way to manually enforce the SARSOP begin from a particular root belief, versus something like starting from the uniform belief or an arbitrarily sampled one.

@zsunberg
Copy link
Member

zsunberg commented Jul 12, 2024

e.g., in my case, I have 200 beliefs all of which are affiliated with some Random.seed!. root_belief(pomdp) would require that seed (or similar) is part of the POMDP definition – but there's no need for the seed otherwise in the POMDP.

I think you will be able to do this:

for _ in 1:200
    current_belief = gen_belief()
    solver = SARSOPSolver(root_belief = current_belief)
    solve(solver, pomdp)
end

Or, if you wanted to avoid reconstructing the solver, you could "capture" the belief variable like this:

current_belief = nothing
solver = SARSOPSolver(root_belief = pomdp -> current_belief)
for _ in 1:200
    current_belief = gen_belief()
    solve(solver, pomdp)
end

Does that make more sense? (make sure you understand what the -> is doing)

@jmuchovej
Copy link
Author

jmuchovej commented Jul 13, 2024

Yep! I know about this – this is what I was referring to with "non-local variables" (option 1 in my "making root_belief work in all cases").

@mykelk
Copy link
Member

mykelk commented Jul 13, 2024

The concept of root_belief is not a property of the problem---it is a detail of the solver. After all, some solvers do not involve the concept of a root belief and do not involve any kind of tree search at all. Of course, depending on the solver, you may want to give it some hints on how to better solve a particular problem instance (e.g., a different root distribution, different parameters that affect branching, different number of iterations).

@jmuchovej
Copy link
Author

@mykelk I see. Thanks for elaborating.

@zsunberg @WhiffleFish I can work on this over the coming week, will update #21 once I've done so.

Do we want tests for this? I suspect so, but I suspect things like the Tiger and Baby POMDPs will be trivial – I was thinking of using RockSample and possible LaserTag? I'm not too familiar with the LaserTag POMDP though.

@zsunberg
Copy link
Member

Thanks, @jmuchovej ! Yes, we should have tests for it. I think Baby and Tiger tests will actually be sufficient. As long as the tests exercise all of the lines of code in the project, it is not that important how large the problems are.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants