From e6a163ff73cea26086e2cee2a439f73bd4f281b5 Mon Sep 17 00:00:00 2001 From: Dylan Asmar <91484811+dylan-asmar@users.noreply.github.com> Date: Fri, 1 Mar 2024 16:12:01 -0800 Subject: [PATCH] Update docs to incorporate POMDPExamples and POMDPGallery (#539) * Added examples to documentation (followed outline of POMDPExamples.jl) * Added Grid World MDP Tutorial * Removed todo statement that was being redered in the docs * added gallery section (similar to POMDPGallery.jl) * updated references to POMDPExamples.jl and POMDPGallery.jl * Changed to internal reference using Documenter. * Remove RoombaPOMDPs (unregistered and is added during documentation build) * Removed RoombaPOMDPs after the example in attempt to fix GR_jll conflict later with TagPOMDPProblem * added Plots to support Tag gallery example * Removed Tag generation and changed to uploaded gif due to GR backend issues and Github actions * clearup of roomba example outputs * Gallery: Working example for TagPOMDP (#540) * Removed Tag note * updated using statements in gallery examples * Got backend GR to work for Tag by adding Plots in setup * Removed TagPOMDP.gif based on creating it at documentation generation * Fixed some typos --- README.md | 6 +- docs/Project.toml | 17 + docs/make.jl | 10 + docs/src/POMDPTools/simulators.md | 14 +- docs/src/example_defining_problems.md | 314 ++++++++++++ docs/src/example_gridworld_mdp.md | 592 ++++++++++++++++++++++ docs/src/example_simulations.md | 174 +++++++ docs/src/example_solvers.md | 108 ++++ docs/src/examples.md | 12 + docs/src/examples/crying_baby_examples.jl | 230 +++++++++ docs/src/examples/crying_baby_solvers.jl | 24 + docs/src/examples/grid_world_overview.gif | Bin 0 -> 8958 bytes docs/src/gallery.md | 264 ++++++++++ docs/src/get_started.md | 2 +- docs/src/index.md | 8 +- docs/src/run_simulation.md | 2 +- 16 files changed, 1758 insertions(+), 19 deletions(-) create mode 100644 docs/src/example_defining_problems.md create mode 100644 docs/src/example_gridworld_mdp.md create mode 100644 docs/src/example_simulations.md create mode 100644 docs/src/example_solvers.md create mode 100644 docs/src/examples.md create mode 100644 docs/src/examples/crying_baby_examples.jl create mode 100644 docs/src/examples/crying_baby_solvers.jl create mode 100644 docs/src/examples/grid_world_overview.gif create mode 100644 docs/src/gallery.md diff --git a/README.md b/README.md index ee853b98..9e6c7365 100644 --- a/README.md +++ b/README.md @@ -89,17 +89,15 @@ end println("Undiscounted reward was $rsum.") ``` -For more examples with visualization, see the documentation below and [POMDPGallery.jl](https://github.com/JuliaPOMDP/POMDPGallery.jl). +For more examples and examples with visualizations, reference the [Examples](https://JuliaPOMDP.github.io/POMDPs.jl/latest/examples) and [Gallery of POMDPs.jl Problems](https://JuliaPOMDP.github.io/POMDPs.jl/latest/gallery) sections of the documentaiton. ## Documentation and Tutorials -In addition to the above-mentioned [Julia Academy course](https://juliaacademy.com/p/decision-making-under-uncertainty-with-pomdps-jl), detailed documentation can be found [here](http://juliapomdp.github.io/POMDPs.jl/stable/). +In addition to the above-mentioned [Julia Academy course](https://juliaacademy.com/p/decision-making-under-uncertainty-with-pomdps-jl), detailed documentation and examples can be found [here](http://juliapomdp.github.io/POMDPs.jl/stable/). [![Docs](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaPOMDP.github.io/POMDPs.jl/stable) [![Docs](https://img.shields.io/badge/docs-latest-blue.svg)](https://JuliaPOMDP.github.io/POMDPs.jl/latest) -Several tutorials are hosted in the [POMDPExamples repository](https://github.com/JuliaPOMDP/POMDPExamples.jl). - ## Supported Packages diff --git a/docs/Project.toml b/docs/Project.toml index f30f6ffe..d714848f 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,12 +1,29 @@ [deps] +BasicPOMCP = "d721219e-3fc6-5570-a8ef-e5402f47c49e" +Cairo = "159f3aea-2a34-519c-b102-8c37f9878175" +Compose = "a81c6b42-2e10-5240-aca2-a61377ecd94b" +DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" +DiscreteValueIteration = "4b033969-44f6-5439-a48b-c11fa3648068" Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f" Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" +DroneSurveillance = "63556450-714a-11e9-08e1-e368b701e279" +Fontconfig = "186bb1d3-e1f7-5a2c-a377-96d770f13627" LightGraphs = "093fc24a-ae57-5d10-9952-331d41423f4d" +MCTS = "e12ccd36-dcad-5f33-8774-9175229e7b33" NamedTupleTools = "d9ec5142-1e00-5aa0-9d6a-321866360f50" +NativeSARSOP = "a07c76ea-660d-4c9a-8028-2e6dbd212cb8" +POMDPGifs = "7f35509c-0cb9-11e9-0708-2928828cdbb7" +POMDPLinter = "f3bd98c0-eb40-45e2-9eb1-f2763262d755" POMDPModels = "355abbd5-f08e-5560-ac9e-8b5f2592a0ca" POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7" POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d" +ParticleFilters = "c8b314e2-9260-5cf8-ae76-3be7461ca6d0" +QMDP = "3aa3ecc9-5a5d-57c8-8188-3e47bd8068d2" QuickPOMDPs = "8af83fb2-a731-493c-9049-9e19dbce6165" +RockSample = "de008ff0-c357-11e8-3329-7fe746fe836e" +StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91" +TagPOMDPProblem = "8a653263-a1cc-4cf9-849f-f530f6ffc800" +UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228" [compat] Documenter = "1" diff --git a/docs/make.jl b/docs/make.jl index 6c1e2571..3f5e2a52 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -37,6 +37,15 @@ makedocs( "run_simulation.md", "policy_interaction.md" ], + + "Examples and Gallery" => [ + "examples.md", + "example_defining_problems.md", + "example_solvers.md", + "example_simulations.md", + "example_gridworld_mdp.md", + "gallery.md" + ], "POMDPTools" => [ "POMDPTools/index.md", @@ -59,4 +68,5 @@ makedocs( deploydocs( repo = "github.com/JuliaPOMDP/POMDPs.jl.git", + push_preview=true ) diff --git a/docs/src/POMDPTools/simulators.md b/docs/src/POMDPTools/simulators.md index cfa7ccb8..cf23173b 100644 --- a/docs/src/POMDPTools/simulators.md +++ b/docs/src/POMDPTools/simulators.md @@ -2,7 +2,7 @@ POMDPTools contains a collection of POMDPs.jl simulators. -Usage examples can be found in the [simulation tutorial in the POMDPExamples package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb). +Usage examples can be found in the [Simulations Examples](@ref) section. If you are just getting started, probably the easiest way to begin is the [`stepthrough` function](@ref Stepping-through). Otherwise, consult the [Which Simulator Should I Use?](@ref which_simulator) guide below: @@ -51,8 +51,6 @@ for (s, a, o, r) in stepthrough(pomdp, policy, "s,a,o,r", max_steps=10) end ``` -More examples can be found in the [POMDPExamples Package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb). - ```@docs stepthrough ``` @@ -77,8 +75,6 @@ policy = RandomPolicy(mdp) r = simulate(rs, mdp, policy) ``` -More examples can be found in the [POMDPExamples Package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb) - ```@docs RolloutSimulator ``` @@ -95,8 +91,6 @@ policy = RandomPolicy(pomdp) h = simulate(hr, pomdp, policy) ``` -More examples can be found in the [POMDPExamples Package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb). - ```@docs HistoryRecorder ``` @@ -116,10 +110,6 @@ This allows a flexible and general way to interact with a POMDP environment with In the POMDP case, an updater can optionally be supplied as an additional positional argument if the policy function works with beliefs rather than directly with observations. -More examples can be found in the [POMDPExamples Package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb) - -More examples can be found in the [POMDPExamples Package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb) - ```@docs sim ``` @@ -197,7 +187,7 @@ POMDPTools contains a utility for running many Monte Carlo simulations in parall ### Example -An example can be found in the [POMDPExamples Package](https://github.com/JuliaPOMDP/POMDPExamples.jl/blob/master/notebooks/Running-Simulations.ipynb). +An example can be found in the [Parallel Simulations](@ref) section. ### Sim objects diff --git a/docs/src/example_defining_problems.md b/docs/src/example_defining_problems.md new file mode 100644 index 00000000..85eb3019 --- /dev/null +++ b/docs/src/example_defining_problems.md @@ -0,0 +1,314 @@ +# Defining a POMDP +As mentioned in the [Defining POMDPs and MDPs](@ref defining_pomdps) section, there are verious ways to define a POMDP using POMDPs.jl. In this section, we provide more examples of how to define a POMDP using the different interfaces. + +There is a large variety of problems that can be expressed as MDPs and POMDPs and different solvers require different components of the POMDPs.jl interface to be defined. Therefore, these examples are not intended to cover all possible use cases. When deeloping a problem and you have an idea of what solver(s) you would like to use, it is recommended to use [POMDPLinter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help you to determine what components of the POMDPs.jl interface need to be defined. Reference the [Checking Requirements](@ref) section for an example of using POMDPLinter. + +## CryingBaby Problem Definition +For the examples, we will use the CryingBaby problem from [Algorithms for Decision Making](https://algorithmsbook.com/) by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray. + +!!! note + This craying baby problem follows the description in Algorithms for Decision Making and is different than `BabyPOMDP` defined in [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl). + +From [Appendix F](https://algorithmsbook.com/files/appendix-f.pdf) of Algorithms for Decision Making: +> The crying baby problem is a simple POMDP with two states, three actions, and two observations. Our goal is to care for a baby, and we do so by choosing at each time step whether to feed the baby, sing to the baby, or ignore the baby. +> +> The baby becomes hungry over time. We do not directly observe whether the baby is hungry; instead, we receive a noisy observation in the form of whether the baby is crying. The state, action, and observation spaces are as follows: +> ```math +> \begin{align*} +> \mathcal{S} &= \{\text{sated}, \text{hungry} \}\\ +> \mathcal{A} &= \{\text{feed}, \text{sing}, \text{ignore} \} \\ +> \mathcal{O} &= \{\text{crying}, \text{quiet} \} +> \end{align*} +> ``` +> +> Feeding will always sate the baby. Ignoring the baby risks a sated baby becoming hungry, and ensures that a hungry baby remains hungry. Singing to the baby is an information-gathering action with the same transition dynamics as ignoring, but without the potential for crying when sated (not hungry) and with an increased chance of crying when hungry. +> +> The transition dynamics are as follows: +> ```math +> \begin{align*} +> & T(\text{sated} \mid \text{hungry}, \text{feed}) = 100\% \\ +> & T(\text{hungry} \mid \text{hungry}, \text{sing}) = 100\% \\ +> & T(\text{hungry} \mid \text{hungry}, \text{ignore}) = 100\% \\ +> & T(\text{sated} \mid \text{sated}, \text{feed}) = 100\% \\ +> & T(\text{hungry} \mid \text{sated}, \text{sing}) = 10\% \\ +> & T(\text{hungry} \mid \text{sated}, \text{ignore}) = 10\% +> \end{align*} +> ``` +> +> The observation dynamics are as follows: +> ```math +> \begin{align*} +> & O(\text{crying} \mid \text{feed}, \text{hungry}) = 80\% \\ +> & O(\text{crying} \mid \text{sing}, \text{hungry}) = 90\% \\ +> & O(\text{crying} \mid \text{ignore}, \text{hungry}) = 80\% \\ +> & O(\text{crying} \mid \text{feed}, \text{sated}) = 10\% \\ +> & O(\text{crying} \mid \text{sing}, \text{sated}) = 0\% \\ +> & O(\text{crying} \mid \text{ignore}, \text{sated}) = 10\% +> \end{align*} +> ``` +> +> The reward function assigns ``−10`` reward if the baby is hungry, independent of the action taken. The effort of feeding the baby adds a further ``−5`` reward, whereas singing adds ``−0.5`` reward. As baby caregivers, we seek the optimal infinite-horizon policy with discount factor ``\gamma = 0.9``. + +## [QuickPOMDP Interface](@id quick_crying) +```julia +using POMDPs +using POMDPTools +using QuickPOMDPs + +quick_crying_baby_pomdp = QuickPOMDP( + states = [:sated, :hungry], + actions = [:feed, :sing, :ignore], + observations = [:quiet, :crying], + initialstate = Deterministic(:sated), + discount = 0.9, + transition = function (s, a) + if a == :feed + return Deterministic(:sated) + elseif s == :sated # :sated and a != :feed + return SparseCat([:sated, :hungry], [0.9, 0.1]) + else # s == :hungry and a != :feed + return Deterministic(:hungry) + end + end, + observation = function (a, sp) + if sp == :hungry + if a == :sing + return SparseCat([:crying, :quiet], [0.9, 0.1]) + else # a == :ignore || a == :feed + return SparseCat([:crying, :quiet], [0.8, 0.2]) + end + else # sp = :sated + if a == :sing + return Deterministic(:quiet) + else # a == :ignore || a == :feed + return SparseCat([:crying, :quiet], [0.1, 0.9]) + end + + end + end, + reward = function (s, a) + r = 0.0 + if s == :hungry + r += -10.0 + end + if a == :feed + r += -5.0 + elseif a == :sing + r+= -0.5 + end + return r + end +) +``` + +## [Explicit Interface](@id explicit_crying) +```julia +using POMDPs +using POMDPTools + +struct CryingBabyState + hungry::Bool +end + +struct CryingBabyPOMDP <: POMDP{CryingBabyState, Symbol, Symbol} + p_sated_to_hungry::Float64 + p_cry_feed_hungry::Float64 + p_cry_sing_hungry::Float64 + p_cry_ignore_hungry::Float64 + p_cry_feed_sated::Float64 + p_cry_sing_sated::Float64 + p_cry_ignore_sated::Float64 + reward_hungry::Float64 + reward_feed::Float64 + reward_sing::Float64 + discount_factor::Float64 +end + +function CryingBabyPOMDP(; + p_sated_to_hungry=0.1, + p_cry_feed_hungry=0.8, + p_cry_sing_hungry=0.9, + p_cry_ignore_hungry=0.8, + p_cry_feed_sated=0.1, + p_cry_sing_sated=0.0, + p_cry_ignore_sated=0.1, + reward_hungry=-10.0, + reward_feed=-5.0, + reward_sing=-0.5, + discount_factor=0.9 +) + return CryingBabyPOMDP(p_sated_to_hungry, p_cry_feed_hungry, + p_cry_sing_hungry, p_cry_ignore_hungry, p_cry_feed_sated, + p_cry_sing_sated, p_cry_ignore_sated, reward_hungry, + reward_feed, reward_sing, discount_factor) +end + +POMDPs.actions(::CryingBabyPOMDP) = [:feed, :sing, :ignore] +POMDPs.states(::CryingBabyPOMDP) = [CryingBabyState(false), CryingBabyState(true)] +POMDPs.observations(::CryingBabyPOMDP) = [:crying, :quiet] +POMDPs.stateindex(::CryingBabyPOMDP, s::CryingBabyState) = s.hungry ? 2 : 1 +POMDPs.obsindex(::CryingBabyPOMDP, o::Symbol) = o == :crying ? 1 : 2 +POMDPs.actionindex(::CryingBabyPOMDP, a::Symbol) = a == :feed ? 1 : a == :sing ? 2 : 3 + +function POMDPs.transition(pomdp::CryingBabyPOMDP, s::CryingBabyState, a::Symbol) + if a == :feed + return Deterministic(CryingBabyState(false)) + elseif s == :sated # :sated and a != :feed + return SparseCat([CryingBabyState(false), CryingBabyState(true)], [1 - pomdp.p_sated_to_hungry, pomdp.p_sated_to_hungry]) + else # s == :hungry and a != :feed + return Deterministic(CryingBabyState(true)) + end +end + +function POMDPs.observation(pomdp::CryingBabyPOMDP, a::Symbol, sp::CryingBabyState) + if sp.hungry + if a == :sing + return SparseCat([:crying, :quiet], [pomdp.p_cry_sing_hungry, 1 - pomdp.p_cry_sing_hungry]) + elseif a== :ignore + return SparseCat([:crying, :quiet], [pomdp.p_cry_ignore_hungry, 1 - pomdp.p_cry_ignore_hungry]) + else # a == :feed + return SparseCat([:crying, :quiet], [pomdp.p_cry_feed_hungry, 1 - pomdp.p_cry_feed_hungry]) + end + else # sated + if a == :sing + return SparseCat([:crying, :quiet], [pomdp.p_cry_sing_sated, 1 - pomdp.p_cry_sing_sated]) + elseif a== :ignore + return SparseCat([:crying, :quiet], [pomdp.p_cry_ignore_sated, 1 - pomdp.p_cry_ignore_sated]) + else # a == :feed + return SparseCat([:crying, :quiet], [pomdp.p_cry_feed_sated, 1 - pomdp.p_cry_feed_sated]) + end + end +end + +function POMDPs.reward(pomdp::CryingBabyPOMDP, s::CryingBabyState, a::Symbol) + r = 0.0 + if s.hungry + r += pomdp.reward_hungry + end + if a == :feed + r += pomdp.reward_feed + elseif a == :sing + r += pomdp.reward_sing + end + return r +end + +POMDPs.discount(pomdp::CryingBabyPOMDP) = pomdp.discount_factor + +POMDPs.initialstate(::CryingBabyPOMDP) = Deterministic(CryingBabyState(false)) + +explicit_crying_baby_pomdp = CryingBabyPOMDP() +``` + +## [Generative Interface](@id gen_crying) +This crying baby problem should not be implemented using the generative interface. However, this exmple is provided for pedagogical purposes. + +```julia +using POMDPs +using POMDPTools +using Random + +struct GenCryingBabyState + hungry::Bool +end + +struct GenCryingBabyPOMDP <: POMDP{CryingBabyState, Symbol, Symbol} + p_sated_to_hungry::Float64 + p_cry_feed_hungry::Float64 + p_cry_sing_hungry::Float64 + p_cry_ignore_hungry::Float64 + p_cry_feed_sated::Float64 + p_cry_sing_sated::Float64 + p_cry_ignore_sated::Float64 + reward_hungry::Float64 + reward_feed::Float64 + reward_sing::Float64 + discount_factor::Float64 + + GenCryingBabyPOMDP() = new(0.1, 0.8, 0.9, 0.8, 0.1, 0.0, 0.1, -10.0, -5.0, -0.5, 0.9) +end + +function POMDPs.gen(pomdp::GenCryingBabyPOMDP, s::CryingBabyState, a::Symbol, rng::AbstractRNG) + + if a == :feed + sp = GenCryingBabyState(false) + else + sp = rand(rng) < pomdp.p_sated_to_hungry ? GenCryingBabyState(true) : GenCryingBabyState(false) + end + + if sp.hungry + if a == :sing + o = rand(rng) < pomdp.p_cry_sing_hungry ? :crying : :quiet + elseif a== :ignore + o = rand(rng) < pomdp.p_cry_ignore_hungry ? :crying : :quiet + else # a == :feed + o = rand(rng) < pomdp.p_cry_feed_hungry ? :crying : :quiet + end + else # sated + if a == :sing + o = rand(rng) < pomdp.p_cry_sing_sated ? :crying : :quiet + elseif a== :ignore + o = rand(rng) < pomdp.p_cry_ignore_sated ? :crying : :quiet + else # a == :feed + o = rand(rng) < pomdp.p_cry_feed_sated ? :crying : :quiet + end + end + + r = 0.0 + if sp.hungry + r += pomdp.reward_hungry + end + if a == :feed + r += pomdp.reward_feed + elseif a == :sing + r += pomdp.reward_sing + end + + return (sp=sp, o=o, r=r) +end + +POMDPs.initialstate(::GenCryingBabyPOMDP) = Deterministic(GenCryingBabyState(false)) + +gen_crying_baby_pomdp = GenCryingBabyPOMDP() +``` + +## [Probability Tables](@id tab_crying) +For this implementaion we will use the following indexes: +- States + - `:sated` = 1 + - `:hungry` = 2 +- Actions + - `:feed` = 1 + - `:sing` = 2 + - `:ignore` = 3 +- Observations + - `:crying` = 1 + - `:quiet` = 2 + +```julia +using POMDPModels + +T = zeros(2, 3, 2) # |S| x |A| x |S'|, T[sp, a, s] = p(sp | a, s) +T[:, 1, :] = [1.0 1.0; + 0.0 0.0] +T[:, 2, :] = [0.9 0.0; + 0.1 1.0] +T[:, 3, :] = [0.9 0.0; + 0.1 1.0] + +O = zeros(2, 3, 2) # |O| x |A| x |S'|, O[o, a, sp] = p(o | a, sp) +O[:, 1, :] = [0.1 0.8; + 0.9 0.2] +O[:, 2, :] = [0.0 0.9; + 1.0 0.1] +O[:, 3, :] = [0.1 0.8; + 0.9 0.2] + +R = zeros(2, 3) # |S| x |A| +R = [-5.0 -0.5 0.0; + -15.0 -10.5 0.0] + +discount = 0.9 + +tabular_crying_baby_pomdp = TabularPOMDP(T, R, O, discount) +``` \ No newline at end of file diff --git a/docs/src/example_gridworld_mdp.md b/docs/src/example_gridworld_mdp.md new file mode 100644 index 00000000..4dfa4e48 --- /dev/null +++ b/docs/src/example_gridworld_mdp.md @@ -0,0 +1,592 @@ +# GridWorld MDP Tutorial + +In this tutorial, we provide a simple example of how to define a Markov decision process (MDP) using the POMDPS.jl interface. We will then solve the MDP using value iteration and Monte Carlo tree search (MCTS). We will walk through constructing the MDP using the explicit interface which invovles defining a new type for the MDP and then extending different components of the POMDPs.jl interface for that type. + +## Dependencies + +We need a few modules in order to run this example. All of the models can be added by running the following command in the Julia REPL: + +```julia +using Pkg + +Pkg.add("POMDPs") +Pkg.add("POMDPTools") +Pkg.add("DiscreteValueIteration") +Pkg.add("MCTS") +``` + +If you already had the models installed, it is prudent to update them to the latest version: + +```julia +Pkg.update() +``` + +Now that we have the models installed, we can load them into our workspace: + +```@example gridworld_mdp +using POMDPs +using POMDPTools +using DiscreteValueIteration +using MCTS +``` + +## Problem Overview + +In Grid World, we are trying to control an agent who has trouble moving in the desired direction. In our problem, we have four reward states within the a grid. Each position on the grid represents a state, and the positive reward states are terminal (the agent stops recieving reward after reaching them and performing an action from that state). The agent has four actions to choose from: up, down, left, right. The agent moves in the desired direction with a probability of $0.7$, and with a probability of $0.1$ in each of the remaining three directions. If the agent bumps into the outside wall, there is a penalty of $1$ (i.e. reward of $-1$). The problem has the following form: + +![Grid World](examples/grid_world_overview.gif) + +## Defining the Grid World MDP Type + +In POMDPs.jl, an MDP is defined by creating a subtype of the `MDP` abstract type. The types of the states and actions for the MDP are declared as [parameters](https://docs.julialang.org/en/v1/manual/types/#Parametric-Types-1) of the MDP type. For example, if our states and actions are both represented by integers, we can define our MDP type as follows: + +```julia +struct MyMDP <: MDP{Int64, Int64} # MDP{StateType, ActionType} + # fields go here +end +``` + +In our grid world problem, we will represent the states using a custom type that designates the `x` and `y` coordinate within the grid. The actions will by represented by a symbol. + +### GridWorldState +There are numerous ways to represent the state of the agent in a grid world. We will use a custom type that designates the `x` and `y` coordinate within the grid. + +```@example gridworld_mdp +struct GridWorldState + x::Int64 + y::Int64 +end +``` + +To help us later, let's extend the `==` for our `GridWorldStat`: + +```@example gridworld_mdp +function Base.:(==)(s1::GridWorldState, s2::GridWorldState) + return s1.x == s2.x && s1.y == s2.y +end +``` + +### GridWorld Actions +Since our action is the direction the agent chooses to go (i.e. up, down, left, right), we can use a Symbol to represent it. Note that in this case, we are not defining a custom type for our action, instead we represent it directly with a symbol. Our actions will be `:up`, `:down`, `:left`, and `:right`. + +### GridWorldMDP +Now that we have defined our types for states and actions, we can define our MDP type. We will call it `GridWorldMDP` and it will be a subtype of `MDP{GridWorldState, Symbol}`. + +```@example gridworld_mdp +struct GridWorldMDP <: MDP{GridWorldState, Symbol} + size_x::Int64 # x size of the grid + size_y::Int64 # y size of the grid + reward_states_values::Dict{GridWorldState, Float64} # Dictionary mapping reward states to their values + hit_wall_reward::Float64 # reward for hitting a wall + tprob::Float64 # probability of transitioning to the desired state + discount_factor::Float64 # disocunt factor +end +``` + +We can define a constructor for our `GridWorldMDP` to make it easier to create instances of our MDP. + +```@example gridworld_mdp +function GridWorldMDP(; + size_x::Int64=10, + size_y::Int64=10, + reward_states_values::Dict{GridWorldState, Float64}=Dict( + GridWorldState(4, 3) => -10.0, + GridWorldState(4, 6) => -5.0, + GridWorldState(9, 3) => 10.0, + GridWorldState(8, 8) => 3.0), + hit_wall_reward::Float64=-1.0, + tprob::Float64=0.7, + discount_factor::Float64=0.9) + return GridWorldMDP(size_x, size_y, reward_states_values, hit_wall_reward, tprob, discount_factor) +end +``` + +To help us visualize our MDP, we can extend `show` for our `GridWorldMDP` type: + +```@example gridworld_mdp +function Base.show(io::IO, mdp::GridWorldMDP) + println(io, "Grid World MDP") + println(io, "\tSize x: $(mdp.size_x)") + println(io, "\tSize y: $(mdp.size_y)") + println(io, "\tReward states:") + for (key, value) in mdp.reward_states_values + println(io, "\t\t$key => $value") + end + println(io, "\tHit wall reward: $(mdp.hit_wall_reward)") + println(io, "\tTransition probability: $(mdp.tprob)") + println(io, "\tDiscount: $(mdp.discount_factor)") +end +``` + +Now lets create an instance of our `GridWorldMDP`: + +```@example gridworld_mdp +mdp = GridWorldMDP() + +``` + +!!! note + In this definition of the problem, our coordiates start in the bottom left of the grid. That is GridState(1, 1) is the bottom left of the grid and GridState(10, 10) would be on the right of the grid with a grid size of 10 by 10. + +## Grid World State Space +The state space in an MDP represents all the states in the problem. There are two primary functionalities that we want our spaces to support. We want to be able to iterate over the state space (for Value Iteration for example), and sometimes we want to be able to sample form the state space (used in some POMDP solvers). In this notebook, we will only look at iterable state spaces. + +Since we can iterate over elements of an array, and our problem is small, we can store all of our states in an array. We also have a terminal state based on the definition of our problem. We can represent that as a location outside of the grid (i.e. `(-1, -1)`). + +```@example gridworld_mdp +function POMDPs.states(mdp::GridWorldMDP) + states_array = GridWorldState[] + for x in 1:mdp.size_x + for y in 1:mdp.size_y + push!(states_array, GridWorldState(x, y)) + end + end + push!(states_array, GridWorldState(-1, -1)) # Adding the terminal state + return states_array +end +``` + +Let's view some of the states in our state space: + +```@example gridworld_mdp +@show states(mdp)[1:5] + +``` + +We also need a other functions related to the state space. + +```@example gridworld_mdp +# Check if a state is the terminal state +POMDPs.isterminal(mdp::GridWorldMDP, s::GridWorldState) = s == GridWorldState(-1, -1) + +# Define the initial state distribution (always start in the bottom left) +POMDPs.initialstate(mdp::GridWorldMDP) = Deterministic(GridWorldState(1, 1)) + +# Function that returns the index of a state in the state space +function POMDPs.stateindex(mdp::GridWorldMDP, s::GridWorldState) + if isterminal(mdp, s) + return length(states(mdp)) + end + + @assert 1 <= s.x <= mdp.size_x "Invalid state" + @assert 1 <= s.y <= mdp.size_y "Invalid state" + + si = (s.x - 1) * mdp.size_y + s.y + return si +end + +``` + + +### Large State Spaces +If your problem is very large we probably do not want to store all of our states in an array. We can create an iterator using indexing functions to help us out. One way of doing this is to define a function that returns a state from an index and then construct an iterator. This is an example of how we can do that for the Grid World problem. + +!!! note + If you run this section, you will redefine the `states(::GridWorldMDP)` that we just defined in the previous section. + +```@example gridworld_mdp + + # Define the length of the state space, number of grid locations plus the terminal state + Base.length(mdp::GridWorldMDP) = mdp.size_x * mdp.size_y + 1 + + # `states` now returns the mdp, which we will construct our iterator from + POMDPs.states(mdp::GridWorldMDP) = mdp + + function Base.getindex(mdp::GridWorldMDP, si::Int) # Enables mdp[si] + @assert si <= length(mdp) "Index out of bounds" + @assert si > 0 "Index out of bounds" + + # First check if we are in the terminal state (which we define as the last state) + if si == length(mdp) + return GridWorldState(-1, -1) + end + + # Otherwise, we need to calculate the x and y coordinates + y = (si - 1) % mdp.size_y + 1 + x = div((si - 1), mdp.size_y) + 1 + return GridWorldState(x, y) + end + + function Base.getindex(mdp::GridWorldMDP, si_range::UnitRange{Int}) # Enables mdp[1:5] + return [getindex(mdp, si) for si in si_range] + end + + Base.firstindex(mdp::GridWorldMDP) = 1 # Enables mdp[begin] + Base.lastindex(mdp::GridWorldMDP) = length(mdp) # Enables mdp[end] + + # We can now construct an iterator + function Base.iterate(mdp::GridWorldMDP, ii::Int=1) + if ii > length(mdp) + return nothing + end + s = getindex(mdp, ii) + return (s, ii + 1) + end + + +``` + +Similar to above, let's iterate over a few of the states in our state space: + +```@example gridworld_mdp +@show states(mdp)[1:5] +@show mdp[begin] +@show mdp[end] + +``` + +## Grid World Action Space +The action space is the set of all actions availiable to the agent. In the grid world problem the action space consists of up, down, left, and right. We can define the action space by implementing a new method of the actions function. + +```@example gridworld_mdp +POMDPs.actions(mdp::GridWorldMDP) = [:up, :down, :left, :right] +``` + +Similar to the state space, we need a function that returns an index given an action. + +```@example gridworld_mdp +function POMDPs.actionindex(mdp::GridWorldMDP, a::Symbol) + @assert in(a, actions(mdp)) "Invalid action" + return findfirst(x -> x == a, actions(mdp)) +end + +``` + +## Grid World Transition Function +MDPs often define the transition function as $T(s^{\prime} \mid s, a)$, which is the probability of transitioning to state $s^{\prime}$ given that we are in state $s$ and take action $a$. For the POMDPs.jl interface, we define the transition function as a distribution over the next states. That is, we want $T(\cdot \mid s, a)$ which is a function that takes in a state and an action and returns a distribution over the next states. + +For our grid world example, there are only a few states to which the agent can transition and thus only a few states with nonzero probaility in $T(\cdot \mid s, a)$. We can use the `SparseCat` distribution to represent this. The `SparseCat` distribution is a categorical distribution that only stores the nonzero probabilities. We can define our transition function as follows: + +```@example gridworld_mdp +function POMDPs.transition(mdp::GridWorldMDP, s::GridWorldState, a::Symbol) + # If we are in the terminal state, we stay in the terminal state + if isterminal(mdp, s) + return SparseCat([s], [1.0]) + end + + # If we are in a positive reward state, we transition to the terminal state + if s in keys(mdp.reward_states_values) && mdp.reward_states_values[s] > 0 + return SparseCat([GridWorldState(-1, -1)], [1.0]) + end + + # Probability of going in a direction other than the desired direction + tprob_other = (1 - mdp.tprob) / 3 + + new_state_up = GridWorldState(s.x, min(s.y + 1, mdp.size_y)) + new_state_down = GridWorldState(s.x, max(s.y - 1, 1)) + new_state_left = GridWorldState(max(s.x - 1, 1), s.y) + new_state_right = GridWorldState(min(s.x + 1, mdp.size_x), s.y) + + new_state_vector = [new_state_up, new_state_down, new_state_left, new_state_right] + t_prob_vector = fill(tprob_other, 4) + + if a == :up + t_prob_vector[1] = mdp.tprob + elseif a == :down + t_prob_vector[2] = mdp.tprob + elseif a == :left + t_prob_vector[3] = mdp.tprob + elseif a == :right + t_prob_vector[4] = mdp.tprob + else + error("Invalid action") + end + + # Combine probabilities for states that are the same + for i in 1:4 + for j in (i + 1):4 + if new_state_vector[i] == new_state_vector[j] + t_prob_vector[i] += t_prob_vector[j] + t_prob_vector[j] = 0.0 + end + end + end + + # Remove states with zero probability + new_state_vector = new_state_vector[t_prob_vector .> 0] + t_prob_vector = t_prob_vector[t_prob_vector .> 0] + + return SparseCat(new_state_vector, t_prob_vector) +end + +``` + +Let's examline a few transitions: + +```@example gridworld_mdp +@show transition(mdp, GridWorldState(1, 1), :up) + +``` + +```@example gridworld_mdp +@show transition(mdp, GridWorldState(1, 1), :left) + +``` + +```@example gridworld_mdp +@show transition(mdp, GridWorldState(9, 3), :right) + +``` + +```@example gridworld_mdp +@show transition(mdp, GridWorldState(-1, -1), :down) + +``` + +## Grid World Reward Function + +In our problem, we have a reward function that depends on the next state as well (i.e. if we hit a wall, we stay in the same state and get a reward of $-1$). We can still construct a reward function that only depends on the current state and action by using expectation over the next state. That is, we can define our reward function as $R(s, a) = \mathbb{E}_{s^{\prime} \sim T(\cdot \mid s, a)}[R(s, a, s^{\prime})]$. + +```@example gridworld_mdp +# First, let's define the reward function given the state, action, and next state +function POMDPs.reward(mdp::GridWorldMDP, s::GridWorldState, a::Symbol, sp::GridWorldState) + # If we are in the terminal state, we get a reward of 0 + if isterminal(mdp, s) + return 0.0 + end + + # If we are in a positive reward state, we get the reward of that state + # For a positive reward, we transition to the terminal state, so we don't have + # to worry about the next state (i.g. hitting a wall) + if s in keys(mdp.reward_states_values) && mdp.reward_states_values[s] > 0 + return mdp.reward_states_values[s] + end + + # If we are in a negative reward state, we get the reward of that state + # If the negative reward state is on the edge of the grid, we can also be in this state + # and hit a wall, so we need to check for that + r = 0.0 + if s in keys(mdp.reward_states_values) && mdp.reward_states_values[s] < 0 + r += mdp.reward_states_values[s] + end + + # If we hit a wall, we get a reward of -1 + if s == sp + r += mdp.hit_wall_reward + end + + return r +end + +# Now we can define the reward function given the state and action +function POMDPs.reward(mdp::GridWorldMDP, s::GridWorldState, a::Symbol) + r = 0.0 + for (sp, p) in transition(mdp, s, a) + r += p * reward(mdp, s, a, sp) + end + return r +end + +``` + +Let's examine a few rewards: + +```@example gridworld_mdp +@show reward(mdp, GridWorldState(1, 1), :up) + +``` + +```@example gridworld_mdp +@show reward(mdp, GridWorldState(1, 1), :left) + +``` + +```@example gridworld_mdp +@show reward(mdp, GridWorldState(9, 3), :right) + +``` + +```@example gridworld_mdp +@show reward(mdp, GridWorldState(-1, -1), :down) + +``` + +```@example gridworld_mdp +@show reward(mdp, GridWorldState(2, 3), :up) + +``` + +## Grid World Remaining Functions +We are almost done! We still need to define `discount`. Let's first use `POMDPLinter` to check if we have defined all the functions we need for DiscreteValueIteration: + +```@example gridworld_mdp +using POMDPLinter + +@show_requirements POMDPs.solve(ValueIterationSolver(), mdp) + +``` +As we expected, we need to define `discount`. + +```@example gridworld_mdp +function POMDPs.discount(mdp::GridWorldMDP) + return mdp.discount_factor +end + +``` + +Let's check again: + +```@example gridworld_mdp +@show_requirements POMDPs.solve(ValueIterationSolver(), mdp) + +``` + +## Solving the Grid World MDP (Value Iteration) +Now that we have defined our MDP, we can solve it using Value Iteration. We will use the `ValueIterationSolver` from the [DiscreteValueIteration](https://github.com/JuliaPOMDP/DiscreteValueIteration.jl) package. First, we construct the a Solver type which contains the solver parameters. Then we call `POMDPs.solve` to solve the MDP and return a policy. + +```@example gridworld_mdp +# Initialize the problem (we have already done this, but just calling it again for completeness in the example) +mdp = GridWorldMDP() + +# Initialize the solver with desired parameters +solver = ValueIterationSolver(; max_iterations=100, belres=1e-3, verbose=true) + +# Solve for an optimal policy +vi_policy = POMDPs.solve(solver, mdp) +nothing # hide + +``` + +We can now use the policy to compute the optimal action for a given state: + +```@example gridworld_mdp +s = GridWorldState(9, 2) +@show action(vi_policy, s) + +``` + +```@example gridworld_mdp +s = GridWorldState(8, 3) +@show action(vi_policy, s) + +``` + +## Solving the Grid World MDP (MCTS) +Similar to the process with Value Iteration, we can solve the MDP using MCTS. We will use the `MCTSSolver` from the [MCTS](https://github.com/JuliaPOMDP/MCTS.jl) package. + +```@example gridworld_mdp +# Initialize the problem (we have already done this, but just calling it again for completeness in the example) +mdp = GridWorldMDP() + +# Initialize the solver with desired parameters +solver = MCTSSolver(n_iterations=1000, depth=20, exploration_constant=10.0) + +# Now we construct a planner by calling POMDPs.solve. For online planners, the computation for the +# optimal action occurs in the call to `action`. +mcts_planner = POMDPs.solve(solver, mdp) +nothing # hide + +``` + +Similar to the value iteration policy, we can use the policy to compute the action for a given state: + +```@example gridworld_mdp +s = GridWorldState(9, 2) +@show action(mcts_planner, s) + +``` + +```@example gridworld_mdp +s = GridWorldState(8, 3) +@show action(mcts_planner, s) + +``` + +## Visualizing the Value Iteration Policy +We can visualize the value iteration policy by plotting the value function and the policy. We can use numerous plotting packages to do this, but we will use [UnicodePlots](https://github.com/JuliaPlots/UnicodePlots.jl) for this example. + +```@example gridworld_mdp +using UnicodePlots +using Printf +``` + +### Value Function as a Heatmap +We can plot the value function as a heatmap. The value function is a function over the state space, so we need to iterate over the state space and store the value at each state. We can use the `value` function to evaluate the value function at a given state. + +```@example gridworld_mdp +# Initialize the value function array +value_function = zeros(mdp.size_y, mdp.size_x) + +# Iterate over the state space and store the value at each state +for s in states(mdp) + if isterminal(mdp, s) + continue + end + value_function[s.y, s.x] = value(vi_policy, s) +end + +# Plot the value function +heatmap(value_function; + title="GridWorld VI Value Function", + xlabel="x position", + ylabel="y position", + colormap=:inferno +) + +``` + +!!! note + Rendering of unicode plots in the documentation is not optimal. For a better image, run this locally in a REPL. + +### Visualizing the Value Iteration Policy +One way to visualize the policy is to plot the action that the policy takes at each state. + +```@example gridworld_mdp +# Initialize the policy array +policy_array = fill(:up, mdp.size_x, mdp.size_y) + +# Iterate over the state space and store the action at each state +for s in states(mdp) + if isterminal(mdp, s) + continue + end + policy_array[s.x, s.y] = action(vi_policy, s) +end + +# Let's define a mapping from symbols to unicode arrows +arrow_map = Dict( + :up => " ↑ ", + :down => " ↓ ", + :left => " ← ", + :right => " → " +) + +# Plot the policy to the terminal, with the origin in the bottom left +@printf(" GridWorld VI Policy \n") +for y in mdp.size_y+1:-1:0 + if y == mdp.size_y+1 || y == 0 + for xi in 0:10 + if xi == 0 + print(" ") + elseif y == mdp.size_y+1 + print("___") + else + print("---") + end + end + else + for x in 0:mdp.size_x+1 + if x == 0 + @printf("%2d |", y) + elseif x == mdp.size_x + 1 + print("|") + else + print(arrow_map[policy_array[x, y]]) + end + end + end + println() + if y == 0 + for xi in 0:10 + if xi == 0 + print(" ") + else + print(" $xi ") + end + end + end +end +``` + +## Seeing a Policy In Action +Another useful tool is to view the policy in action by creating a gif of a simulation. To accomplish this, we could use [POMDPGifs](https://github.com/JuliaPOMDP/POMDPGifs.jl). To use POMDPGifs, we need to extend the [`POMDPTools.render`](@ref) function to `GridWorldMDP`. Please reference [Gallery of POMDPs.jl Problems](@ref) for examples of this process. \ No newline at end of file diff --git a/docs/src/example_simulations.md b/docs/src/example_simulations.md new file mode 100644 index 00000000..cd6b5e95 --- /dev/null +++ b/docs/src/example_simulations.md @@ -0,0 +1,174 @@ + +# Simulations Examples + +In these simulation examples, we will use the crying baby POMDPs defined in the [Defining a POMDP](@ref) section (i.e. [`quick_crying_baby_pomdp`](@ref quick_crying), [`explicit_crying_baby_pomdp`](@ref explicit_crying), [`gen_crying_baby_pomdp`](@ref gen_crying), and [`tabular_crying_baby_pomdp`](@ref tab_crying)). + +```@setup crying_sim +include("examples/crying_baby_examples.jl") +include("examples/crying_baby_solvers.jl") +``` + +## Stepthrough +The stepthrough simulater provides a window into the simulation with a for-loop syntax. + +Within the body of the for loop, we have access to the belief, the action, the observation, and the reward, in each step. We also calculate the sum of the rewards in this example, but note that this is _not_ the _discounted reward_. + +```@example crying_sim +function run_step_through_simulation() # hide +policy = RandomPolicy(quick_crying_baby_pomdp) +r_sum = 0.0 +step = 0 +for (b, s, a, o, r) in stepthrough(quick_crying_baby_pomdp, policy, DiscreteUpdater(quick_crying_baby_pomdp), "b,s,a,o,r"; max_steps=4) + step += 1 + println("Step $step") + println("b = sated => $(b.b[1]), hungry => $(b.b[2])") + @show s + @show a + @show o + @show r + r_sum += r + @show r_sum + println() +end +end #hide + +run_step_through_simulation() # hide +``` + +## Rollout Simulations +While stepthrough is a flexible and convenient tool for many user-facing demonstrations, it is often less error-prone to use the standard simulate function with a `Simulator` object. The simplest Simulator is the `RolloutSimulator`. It simply runs a simulation and returns the discounted reward. + +```@example crying_sim +function run_rollout_simulation() # hide +policy = RandomPolicy(explicit_crying_baby_pomdp) +sim = RolloutSimulator(max_steps=10) +r_sum = simulate(sim, explicit_crying_baby_pomdp, policy) +println("Total discounted reward: $r_sum") +end # hide +run_rollout_simulation() # hide +``` + +## Recording Histories +Sometimes it is important to record the entire history of a simulation for further examination. This can be accomplished with a `HistoryRecorder`. + +```@example crying_sim +policy = RandomPolicy(tabular_crying_baby_pomdp) +hr = HistoryRecorder(max_steps=5) +history = simulate(hr, tabular_crying_baby_pomdp, policy, DiscreteUpdater(tabular_crying_baby_pomdp), Deterministic(1)) +nothing # hide +``` + +The history object produced by a `HistoryRecorder` is a `SimHistory`, documented in the POMDPTools simulater section [Histories](@ref). The information in this object can be accessed in several ways. For example, there is a function: +```@example crying_sim +discounted_reward(history) +``` +Accessor functions like `state_hist` and `action_hist` can also be used to access parts of the history: +```@example crying_sim +state_hist(history) +``` +``` @example crying_sim +collect(action_hist(history)) +``` + +Keeping track of which states, actions, and observations belong together can be tricky (for example, since there is a starting state, and ending state, but no action is taken from the ending state, the list of actions has a different length than the list of states). It is often better to think of histories in terms of steps that include both starting and ending states. + +The most powerful function for accessing the information in a `SimHistory` is the `eachstep` function which returns an iterator through named tuples representing each step in the history. The `eachstep` function is similar to the `stepthrough` function above except that it iterates through the immutable steps of a previously simulated history instead of conducting the simulation as the for loop is being carried out. + +```@example crying_sim +function demo_eachstep(sim_history) # hide +r_sum = 0.0 +step = 0 +for step_i in eachstep(sim_history, "b,s,a,o,r") + step += 1 + println("Step $step") + println("step_i.b = sated => $(step_i.b.b[1]), hungry => $(step_i.b.b[2])") + @show step_i.s + @show step_i.a + @show step_i.o + @show step_i.r + r_sum += step_i.r + @show r_sum + println() +end +end # hide +demo_eachstep(history) # hide +``` + +## Parallel Simulations +It is often useful to evaluate a policy by running many simulations. The parallel simulator is the most effective tool for this. To use the parallel simulator, first create a list of `Sim` objects, each of which contains all of the information needed to run a simulation. Then then run the simulations using `run_parallel`, which will return a `DataFrame` with the results. + +In this example, we will compare the performance of the polcies we computed in the [Using Different Solvers](@ref) section (i.e. `sarsop_policy`, `pomcp_planner`, and `heuristic_policy`). To evaluate the policies, we will run 100 simulations for each policy. We can do this by adding 100 `Sim` objects of each policy to the list. + +```@example crying_sim +using DataFrames +using StatsBase: std + +# Defining paramters for the simulations +number_of_sim_to_run = 100 +max_steps = 20 +starting_seed = 1 + +# We will also compare against a random policy +rand_policy = RandomPolicy(quick_crying_baby_pomdp, rng=MersenneTwister(1)) + +# Create the list of Sim objects +sim_list = [] + +# Add 100 Sim objects of each policy to the list. +for sim_number in 1:number_of_sim_to_run + seed = starting_seed + sim_number + + # Add the SARSOP policy + push!(sim_list, Sim( + quick_crying_baby_pomdp, + rng=MersenneTwister(seed), + sarsop_policy, + max_steps=max_steps, + metadata=Dict(:policy => "sarsop", :seed => seed)) + ) + + # Add the POMCP policy + push!(sim_list, Sim( + quick_crying_baby_pomdp, + rng=MersenneTwister(seed), + pomcp_planner, + max_steps=max_steps, + metadata=Dict(:policy => "pomcp", :seed => seed)) + ) + + # Add the heuristic policy + push!(sim_list, Sim( + quick_crying_baby_pomdp, + rng=MersenneTwister(seed), + heuristic_policy, + max_steps=max_steps, + metadata=Dict(:policy => "heuristic", :seed => seed)) + ) + + # Add the random policy + push!(sim_list, Sim( + quick_crying_baby_pomdp, + rng=MersenneTwister(seed), + rand_policy, + max_steps=max_steps, + metadata=Dict(:policy => "random", :seed => seed)) + ) +end + +# Run the simulations in parallel +data = run_parallel(sim_list) + +# Define a function to calculate the mean and confidence interval +function mean_and_ci(x) + m = mean(x) + ci = 1.96 * std(x) / sqrt(length(x)) # 95% confidence interval + return (mean = m, ci = ci) +end + +# Calculate the mean and confidence interval for each policy +grouped_df = groupby(data, :policy) +result = combine(grouped_df, :reward => mean_and_ci => AsTable) + +``` + +By default, the parallel simulator only returns the reward from each simulation, but more information can be gathered by specifying a function to analyze the `Sim`-history pair and record additional statistics. Reference the POMDPTools simulator section for more information ([Specifying information to be recorded](@ref)). \ No newline at end of file diff --git a/docs/src/example_solvers.md b/docs/src/example_solvers.md new file mode 100644 index 00000000..069053a7 --- /dev/null +++ b/docs/src/example_solvers.md @@ -0,0 +1,108 @@ +# Using Different Solvers +There are various solvers implemented for use out-of-the-box. Please reference the repository README for a list of [MDP Solvers](https://github.com/JuliaPOMDP/POMDPs.jl?tab=readme-ov-file#mdp-solvers) and [POMDP Solvers](https://github.com/JuliaPOMDP/POMDPs.jl?tab=readme-ov-file#pomdp-solvers) implemented and maintained by the JuliaPOMDP community. We provide a few examples of how to use a small subset of these solvers. + +```@setup crying_sim +include("examples/crying_baby_examples.jl") +``` + +## Checking Requirements +Before using a solver, it is prudent to ensure the problem meets the requirements of the solver. Please reference the solver documentation for detailed information about the requirements of each solver. + +We can use [POMDPLInter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help us determine if we have all of the required components defined for a particular solver. However, not all solvers have the requirements implemented. If/when you encounter a solver that does not have the requirements implemented, please open an issue on the solver's repository. + +Let's check if we have all of the required components of our problems for the QMDP solver. + +```@example crying_sim +using POMDPLinter +using QMDP + +qmdp_solver = QMDPSolver() + +println("Quick Crying Baby POMDP") +@show_requirements POMDPs.solve(qmdp_solver, quick_crying_baby_pomdp) + +println("\nExplicit Crying Baby POMDP") +@show_requirements POMDPs.solve(qmdp_solver, explicit_crying_baby_pomdp) + +println("\nTabular Crying Baby POMDP") +@show_requirements POMDPs.solve(qmdp_solver, tabular_crying_baby_pomdp) + +println("\nGen Crying Baby POMDP") +# We don't have an actions(::GenGryingBabyPOMDP) implemented +try + @show_requirements POMDPs.solve(qmdp_solver, gen_crying_baby_pomdp) +catch err_msg + println(err_msg) +end +``` + +## Offline (SARSOP) +In this example, we will use the [NativeSARSOP](https://github.com/JuliaPOMDP/NativeSARSOP.jl) solver. The process for generating offline polcies is similar for all offline solvers. First, we define the solver with the desired parameters. Then, we call `POMDPs.solve` with the solver and the problem. We can query the policy using the `action` function. + +```@example crying_sim +using NativeSARSOP + +# Define the solver with the desired paramters +sarsop_solver = SARSOPSolver(; max_time=10.0) + +# Solve the problem by calling POMDPs.solve. SARSOP will compute the policy and return an `AlphaVectorPolicy` +sarsop_policy = POMDPs.solve(sarsop_solver, quick_crying_baby_pomdp) + +# We can query the policy using the `action` function +b = initialstate(quick_crying_baby_pomdp) +a = action(sarsop_policy, b) + +@show a + +``` + +## Online (POMCP) +For the online solver, we will use Particle Monte Carlo Planning ([POMCP](https://github.com/JuliaPOMDP/BasicPOMCP.jl)). For online solvers, we first define the solver similar to offline solvers. However, when we call `POMDPs.solve`, we are returned an online plannner. Similar to the offline solver, we can query the policy using the `action` function and that is when the online solver will compute the action. + +```@example crying_sim +using BasicPOMCP + +pomcp_solver = POMCPSolver(; c=5.0, tree_queries=1000, rng=MersenneTwister(1)) +pomcp_planner = POMDPs.solve(pomcp_solver, quick_crying_baby_pomdp) + +b = initialstate(quick_crying_baby_pomdp) +a = action(pomcp_planner, b) + +@show a + +``` + +## Heuristic Policy +While we often want to use a solver to compute a policy, sometimes we might want to use a heuristic policy. For example, we may want to use a heuristic policy during our rollouts for online solvers or to use as a baseline. In this example, we will define a simple heuristic policy that feeds the baby if our belief of the baby being hungry is greater than 50%, otherwise we will randomly ignore or sing to the baby. + +```@example crying_sim +struct HeuristicFeedPolicy{P<:POMDP} <: Policy + pomdp::P +end + +# We need to implement the action function for our policy +function POMDPs.action(policy::HeuristicFeedPolicy, b) + if pdf(b, :hungry) > 0.5 + return :feed + else + return rand([:ignore, :sing]) + end +end + +# Let's also define the default updater for our policy +function POMDPs.updater(policy::HeuristicFeedPolicy) + return DiscreteUpdater(policy.pomdp) +end + +heuristic_policy = HeuristicFeedPolicy(quick_crying_baby_pomdp) + +# Let's query the policy a few times +b = SparseCat([:sated, :hungry], [0.1, 0.9]) +a1 = action(heuristic_policy, b) + +b = SparseCat([:sated, :hungry], [0.9, 0.1]) +a2 = action(heuristic_policy, b) + +@show [a1, a2] + +``` \ No newline at end of file diff --git a/docs/src/examples.md b/docs/src/examples.md new file mode 100644 index 00000000..b725ba7b --- /dev/null +++ b/docs/src/examples.md @@ -0,0 +1,12 @@ +# [Examples](@id examples_section) + +This section contains examples of how to use POMDPs.jl. For specific informaiton about the interface and functions used in the examples, please reference the correpsonding area in the documenation or the [API Documentation](@ref). + +The examples are organized by topic. The exmaples are designed to build through each step. First, we have to define a POMDP. Then we need to solve the POMDP to get a policy. Finally, we can simulate the policy to see how it performs. The examples are designed to be exeucted in order. For example, the examples in [Simulations Examples](@ref) assume that the POMDPs defined in the [Defining a POMDP](@ref) section have been defined and we have a policy we would like to simulate that we computed in the [Using Different Solvers](@ref) section. + +The [GridWorld MDP Tutorial](@ref) section is a standalone example that does not require any of the other examples. + +## Outline +```@contents +Pages = ["example_defining_problems.md", "example_solvers.md", "example_simulations.md", "example_gridworld_mdp.md"] +``` \ No newline at end of file diff --git a/docs/src/examples/crying_baby_examples.jl b/docs/src/examples/crying_baby_examples.jl new file mode 100644 index 00000000..212b6f26 --- /dev/null +++ b/docs/src/examples/crying_baby_examples.jl @@ -0,0 +1,230 @@ +using POMDPs +using POMDPTools +using POMDPModels +using QuickPOMDPs +using Random + +quick_crying_baby_pomdp = QuickPOMDP( + states = [:sated, :hungry], + actions = [:feed, :sing, :ignore], + observations = [:quiet, :crying], + initialstate = Deterministic(:sated), + discount = 0.9, + transition = function (s, a) + if a == :feed + return Deterministic(:sated) + elseif s == :sated # :sated and a != :feed + return SparseCat([:sated, :hungry], [0.9, 0.1]) + else # s == :hungry and a != :feed + return Deterministic(:hungry) + end + end, + observation = function (a, sp) + if sp == :hungry + if a == :sing + return SparseCat([:crying, :quiet], [0.9, 0.1]) + else # a == :ignore || a == :feed + return SparseCat([:crying, :quiet], [0.8, 0.2]) + end + else # sp = :sated + if a == :sing + return Deterministic(:quiet) + else # a == :ignore || a == :feed + return SparseCat([:crying, :quiet], [0.1, 0.9]) + end + + end + end, + reward = function (s, a) + r = 0.0 + if s == :hungry + r += -10.0 + end + if a == :feed + r += -5.0 + elseif a == :sing + r+= -0.5 + end + return r + end +) + +struct CryingBabyState + hungry::Bool +end + +struct CryingBabyPOMDP <: POMDP{CryingBabyState, Symbol, Symbol} + p_sated_to_hungry::Float64 + p_cry_feed_hungry::Float64 + p_cry_sing_hungry::Float64 + p_cry_ignore_hungry::Float64 + p_cry_feed_sated::Float64 + p_cry_sing_sated::Float64 + p_cry_ignore_sated::Float64 + reward_hungry::Float64 + reward_feed::Float64 + reward_sing::Float64 + discount_factor::Float64 +end + +function CryingBabyPOMDP(; + p_sated_to_hungry=0.1, + p_cry_feed_hungry=0.8, + p_cry_sing_hungry=0.9, + p_cry_ignore_hungry=0.8, + p_cry_feed_sated=0.1, + p_cry_sing_sated=0.0, + p_cry_ignore_sated=0.1, + reward_hungry=-10.0, + reward_feed=-5.0, + reward_sing=-0.5, + discount_factor=0.9 +) + return CryingBabyPOMDP(p_sated_to_hungry, p_cry_feed_hungry, + p_cry_sing_hungry, p_cry_ignore_hungry, p_cry_feed_sated, + p_cry_sing_sated, p_cry_ignore_sated, reward_hungry, + reward_feed, reward_sing, discount_factor) +end + +POMDPs.actions(::CryingBabyPOMDP) = [:feed, :sing, :ignore] +POMDPs.states(::CryingBabyPOMDP) = [CryingBabyState(false), CryingBabyState(true)] +POMDPs.observations(::CryingBabyPOMDP) = [:crying, :quiet] +POMDPs.stateindex(::CryingBabyPOMDP, s::CryingBabyState) = s.hungry ? 2 : 1 +POMDPs.obsindex(::CryingBabyPOMDP, o::Symbol) = o == :crying ? 1 : 2 +POMDPs.actionindex(::CryingBabyPOMDP, a::Symbol) = a == :feed ? 1 : a == :sing ? 2 : 3 + +function POMDPs.transition(pomdp::CryingBabyPOMDP, s::CryingBabyState, a::Symbol) + if a == :feed + return Deterministic(CryingBabyState(false)) + elseif s == :sated # :sated and a != :feed + return SparseCat([CryingBabyState(false), CryingBabyState(true)], [1 - pomdp.p_sated_to_hungry, pomdp.p_sated_to_hungry]) + else # s == :hungry and a != :feed + return Deterministic(CryingBabyState(true)) + end +end + +function POMDPs.observation(pomdp::CryingBabyPOMDP, a::Symbol, sp::CryingBabyState) + if sp.hungry + if a == :sing + return SparseCat([:crying, :quiet], [pomdp.p_cry_sing_hungry, 1 - pomdp.p_cry_sing_hungry]) + elseif a== :ignore + return SparseCat([:crying, :quiet], [pomdp.p_cry_ignore_hungry, 1 - pomdp.p_cry_ignore_hungry]) + else # a == :feed + return SparseCat([:crying, :quiet], [pomdp.p_cry_feed_hungry, 1 - pomdp.p_cry_feed_hungry]) + end + else # sated + if a == :sing + return SparseCat([:crying, :quiet], [pomdp.p_cry_sing_sated, 1 - pomdp.p_cry_sing_sated]) + elseif a== :ignore + return SparseCat([:crying, :quiet], [pomdp.p_cry_ignore_sated, 1 - pomdp.p_cry_ignore_sated]) + else # a == :feed + return SparseCat([:crying, :quiet], [pomdp.p_cry_feed_sated, 1 - pomdp.p_cry_feed_sated]) + end + end +end + +function POMDPs.reward(pomdp::CryingBabyPOMDP, s::CryingBabyState, a::Symbol) + r = 0.0 + if s.hungry + r += pomdp.reward_hungry + end + if a == :feed + r += pomdp.reward_feed + elseif a == :sing + r += pomdp.reward_sing + end + return r +end + +POMDPs.discount(pomdp::CryingBabyPOMDP) = pomdp.discount_factor + +POMDPs.initialstate(::CryingBabyPOMDP) = Deterministic(CryingBabyState(false)) + +explicit_crying_baby_pomdp = CryingBabyPOMDP() + +struct GenCryingBabyState + hungry::Bool +end + +struct GenCryingBabyPOMDP <: POMDP{CryingBabyState, Symbol, Symbol} + p_sated_to_hungry::Float64 + p_cry_feed_hungry::Float64 + p_cry_sing_hungry::Float64 + p_cry_ignore_hungry::Float64 + p_cry_feed_sated::Float64 + p_cry_sing_sated::Float64 + p_cry_ignore_sated::Float64 + reward_hungry::Float64 + reward_feed::Float64 + reward_sing::Float64 + discount_factor::Float64 + + GenCryingBabyPOMDP() = new(0.1, 0.8, 0.9, 0.8, 0.1, 0.0, 0.1, -10.0, -5.0, -0.5, 0.9) +end + +function POMDPs.gen(pomdp::GenCryingBabyPOMDP, s::CryingBabyState, a::Symbol, rng::AbstractRNG) + + if a == :feed + sp = GenCryingBabyState(false) + else + sp = rand(rng) < pomdp.p_sated_to_hungry ? GenCryingBabyState(true) : GenCryingBabyState(false) + end + + if sp.hungry + if a == :sing + o = rand(rng) < pomdp.p_cry_sing_hungry ? :crying : :quiet + elseif a== :ignore + o = rand(rng) < pomdp.p_cry_ignore_hungry ? :crying : :quiet + else # a == :feed + o = rand(rng) < pomdp.p_cry_feed_hungry ? :crying : :quiet + end + else # sated + if a == :sing + o = rand(rng) < pomdp.p_cry_sing_sated ? :crying : :quiet + elseif a== :ignore + o = rand(rng) < pomdp.p_cry_ignore_sated ? :crying : :quiet + else # a == :feed + o = rand(rng) < pomdp.p_cry_feed_sated ? :crying : :quiet + end + end + + r = 0.0 + if sp.hungry + r += pomdp.reward_hungry + end + if a == :feed + r += pomdp.reward_feed + elseif a == :sing + r += pomdp.reward_sing + end + + return (sp=sp, o=o, r=r) +end + +POMDPs.initialstate(::GenCryingBabyPOMDP) = Deterministic(GenCryingBabyState(false)) + +gen_crying_baby_pomdp = GenCryingBabyPOMDP() + +T = zeros(2, 3, 2) # |S| x |A| x |S'|, T[sp, a, s] = p(sp | a, s) +T[:, 1, :] = [1.0 1.0; + 0.0 0.0] +T[:, 2, :] = [0.9 0.0; + 0.1 1.0] +T[:, 3, :] = [0.9 0.0; + 0.1 1.0] + +O = zeros(2, 3, 2) # |O| x |A| x |S'|, O[o, a, sp] = p(o | a, sp) +O[:, 1, :] = [0.1 0.8; + 0.9 0.2] +O[:, 2, :] = [0.0 0.9; + 1.0 0.1] +O[:, 3, :] = [0.1 0.8; + 0.9 0.2] + +R = zeros(2, 3) # |S| x |A| +R = [-5.0 -0.5 0.0; + -15.0 -10.5 0.0] + +discount = 0.9 + +tabular_crying_baby_pomdp = TabularPOMDP(T, R, O, discount) diff --git a/docs/src/examples/crying_baby_solvers.jl b/docs/src/examples/crying_baby_solvers.jl new file mode 100644 index 00000000..5c1115c3 --- /dev/null +++ b/docs/src/examples/crying_baby_solvers.jl @@ -0,0 +1,24 @@ +using BasicPOMCP +using NativeSARSOP + +sarsop_solver = SARSOPSolver(; max_time=10.0) +sarsop_policy = POMDPs.solve(sarsop_solver, quick_crying_baby_pomdp) + +pomcp_solver = POMCPSolver(; c=5.0, tree_queries=1000, rng=MersenneTwister(1)) +pomcp_planner = POMDPs.solve(pomcp_solver, quick_crying_baby_pomdp) + +struct HeuristicFeedPolicy{P<:POMDP} <: Policy + pomdp::P +end +function POMDPs.updater(policy::HeuristicFeedPolicy) + return DiscreteUpdater(policy.pomdp) +end +function POMDPs.action(policy::HeuristicFeedPolicy, b) + if pdf(b, :hungry) > 0.5 + return :feed + else + return rand([:ignore, :sing]) + end +end + +heuristic_policy = HeuristicFeedPolicy(quick_crying_baby_pomdp) diff --git a/docs/src/examples/grid_world_overview.gif b/docs/src/examples/grid_world_overview.gif new file mode 100644 index 0000000000000000000000000000000000000000..bf94bba1045e640fa48f6e4a1ca316704ebf774d GIT binary patch literal 8958 zcmeHrXIN9uws$}jq(w!f2_hDxB@_{eG%HOIP^z>D0YVAA_uiXG@6vlGK!AWq@1V5M zi}c=W-r&EU^PcD2=YBZn)7?ASduEkcYyD=enMuSXUh@k;zv4yV{a^>60RT}^QCC-2 zTuTH1xcIu@ZU6ub03-o60D#TSP28&xN0|iF*Vp4(leo4fjtKymTu1?W0B->RAOJ8m zHFfRUH7zYIKnS3ysHnHM_s#_qz{$y(oSY26b>8AMaH=@|F9j?Bi)#P?SnO}_U!DKu z<1Us&Sz4CyCDam%dkc!#nnJY=VU~;^VR~?c5c38K&CCeb5n@*2mFAQ-eF4*li`$#S z-r38%*S0s*7SLf95hfNCf$ErigmKtHO*wSoMwT#h3r3iQwh+^s*Gv~|TZpZ(sWHqF z%4lnBgs_0v3NimufZ*B}WH2-1FA+;aA?6EbEKN*|4B?iHJRnXEZUJ^iU2~{0%*Mpr zkdX`Ygc&!WxsEPG{^jf6hr{)RnDs3!O(9^gjg1Y+h6iL~t_S845D)-!a)Y_KIdBpj z7Ip|rs4WM=;vUZ0FE`-If2tuYK)*Z!X`2{>;mTkx5GNRCh?w!0sh7B5%;Bb%a1#XM zg=y$V6Dvz0W-BYW4n#-$3HKA|6CMt20X`lME-oDb4*th5P7Z#a$GlIt1o(Kl`E{9z z8Gkov`Zrhq!6uG@Gs<&elu;0TVdQT{{{xO#!gc=w?H@e3;QSFkBctF&7osc=voJBT zx&T0!j~mP-2>$=bxDbDnRr%Mle<%Mb`s}f=k|ioent0-|)co zATP{eP)if@_qa3_es+=hj1ophRu-1#7gWY~P-_^YF0P>j`-n?nUP16b>s-YDCk=)x zJ(w^zCnqll?!(Rbo|6Z{#Sh`-RsEadZ$f|QN#klp2UoEFgXTYo{dYjLuzU$O{`W9i zytgxjfpKCmgat02aOH>#>-Tj5YAPmStoQHr^}ujryxdrUP zj1cohDp{Fp!!GIyhx#x7AU48}c{v5ae{}xF(HDNq$@d3G|L;0h7YS+#w}ly5{K5Z4 zx|oxT3H~QP{!3+yGb}Cr@9{D={tGN=X~;_xZ7Uq~mJ%<8e_MlfOtf*!82qmU{$ojd z{O^{uUorpVa2H7Y@&}@b+W>G=i!*^+DgK%C-$_Ir?LSd5wK6yQwIS$egJDLni{LGA z1GCVZ+m)3?U24fW`{+8R`KRVA{bysWgO zxTvroKQA{YJ1a9IJuUT1N^(+SLVR3oOmtLa#OLr&VWA#0KnZN0^E}wcjK;$V>93iRy5L7oSinLHO;)u z2$IyBWht4jS~#3%-LYnmo+`3^1B&ZeEVJ)Ipe8pf9c2nJ8_k`sT{lj!u5Rv765w4& z4?q8az#wZp-{4Q-pCcmWLZf2i;uAb#5>vjUroocav$At?UohbnGQ7ylD=V)^C`DGI zYTT=8>l?lrlF-xQ0UO&pI=i}idi(kZzW*2;8Xg(NjEzrBPEF6u&dpo0E-o#vtgfwZ zY;JAu?C$L!93CB?oSvOy0ShU_;zh{~9z+zZYOO^nHEyIzd9shfX1;k+a%sGb4x4TE zpnvB4hR6&m?!iowalevVOUf0DZQA84)$DS4px_zy$a|q1ODW`bPAX}c)dyZSXE>Xo zc=ImfdBSTi)%OMpUZgJYikA_bXSD@cFeX8r#~O8ZiZS)ultjb&lO`8OL%|u z01HQFtf65^;dFCF9PPNlP$hIBvNmqt=&Q1VA>{;J<@#(**a=9Zs-k?oqDQgbEXb9Uy_aUhpMf)%_vk-!5Kt;#$c<(1?(bnb~J*guW8yB zHbYN2b5}pneVNO5+!gQ^{>GP^S7qdw{77P>nf%Fe zhM59ZpB zrcghq-`$Ysh>>(}HFBDt)~W`IKZ7;Du;|8$Ngmz*^e*s`U%YHQwO4{wfo*1@U0r2n zl2fz(RE)9w)q9^!s#P9*31A&bOmR^an2**f5y(zAyJ2uO;qBB2OHvX(ZBClsbJ|5u z>2v+N=?3;yKXVJ3Rax?sOxu0)rEQlz3##0!a|>(zk8_I}h}-XH2eH~eif@aBu%^mQ z<@lCKZMJ)s|G;<1uNb+3%14gVvJsa{YJC797u@5pppp(&I+mP}64k7lsQ%IAypjpS z%T?~10iIP7fKzZa9))00_2qkxMU^6wbDNFV#MrmK-BdE#YNFQ5+iJdTv%1y7=)=C< zdOu2y3y^K^9^A%`T;1;AY+>K&d^{w`)xPy{l)W~6yQZ{z4TAARyIwQ)tQTW+Dr=D7 zsV(~|{o?F#Bdg&kPwxgdU0KfuZKs@6?vl82lxFx@-k>%~muH`Ik+f(3tNOEw0kZ)i z!%C?bM{gi0`ra*CHy2I7Q$5v zQ6A2fQ@t7Esgq8bMjJIQ;lWR))#4_OM->Bd+m(s|bU2>cbjH;8uCQ)zMcd_c#WzVL z|8%xd(|NUxX9IO!AD&$sgB_sSI7A9M(#X1Yz0k_QAErOf4{)3;)g-7r5GEchB#Fy? zaXj4e>_A=C5lDbZrjD-1W9aoHUY)!u)%x{v4#MI7*d#G5`YZmRtaW>13RZ(ilwpF? zh3EY=u+c|^iT3SP8bO~+6h9lTsrLFY>M#+o>DN(@cKbhhnR4sHqi-Djy#eZJ&u=Ug zH9VA*bAQi1a~4e0G;mbpru-dxt+e$O#2H&0Y<_#DH@xwq_pTzA%qK)t~nIub+eZLj?%FFkGf-eZa^~?(dlP;1x^T^XDbufutI@>94no z%XEECwS-Nusz_dEAc-u$KhFlXYk%E*BSPZv{%zH(of^9RzAb9n+yZvu!ne9&P~Yd- z&1j@m)`MaP%H}JRW2E64;Zf*a7$_fJC1LOp_CzWZGSt>>UA>d=uyMq6N}|VpbSLq! zleuM1;49yZT)0-5HuXkU@=fa9WFqOT=TtepD-CQ$!>1X19W-+O&%IKpyB43*=SV9Y zg(qJk%N7-jZQ0h}O{1NEbfs#9@S}feEVhYsfg9=7AB!k+0PN=Gc$Ci?;A6%uzHeY_vqBlQft#8v=rh&gAlT}}jd111k6$kZtG7;FPo*r=Im=#j zr=8nQEGJroy$CpBonArkHeWX8_HlUX`KS=ue*G3#;E3T|I^3?~=UxeIy(5e}q^gDN z$yN*vM296yfXt|(Br&&BkL!h1H{Tj(@JW{T<;;l84Iy(K0KGnlWV&0w)gr0*wyq@d zs50=ienM*bN3vVh9zWH$!?7!oOP&Z?;!t6aw6ixbMWxX$7f|`Lw`ZR3f_2Cv4dN5yF6Y&`ghnyhns?V9AQF+FFYa&@=nWAyHpkBls+(ZA}W$oI21pg?DkH+ z<}Qy2Tr;}l5Hf+tE_>mGP(~grjmSJh35~!XEbmMlb|dQ!=-(nHB`=?nSJo5Cm^$w4 zM5dB+R8oQvhT)hI3y$qLhcko?nzkE8Zu2EY&4jz)>9mA~Z7ZFyi9e=l*6;MN{BE#O z5Hfz6?tr8HS@yP%2kqSKExYa{4l{3*d3zJ+3`@jkBdv zkOLAk7H=wex@MK+FierOSL}1X(?oO%Cb@IacjtUC9qqJ+x{Yps>$$d2XE*_#JQ{B~ z-}TGeN4`DT4)t-_%9K9g3UoQBYjHW+nRMIE;MY*Ms4?u+8xU2{;<_e^w8sZmghE~Cx@>dSlOM9 z$m71a2cN#j0<#s_BK)boqjZbM6<>EranIHkPufb?_qLvJelPh}vkzG>)eoKZ9(h9- zJ*=WVSNR@YM*27o`#3Lh6C$Nu_3d=NeFGMK z-Nd{F=AGrCeiXibdYOJcl(y3R42g$sTDI=iw&o^_W@h|unXP_VL^f+SjtN=Ulc`-S zhOP3W&9Q~P{zu6GVoo5+*C$6mP~27{EXyF=*K0%`NH)buEedR^^t90rYRL+$6}LcF zn(vtc0M~*UMT4@V1JiAT7FzuU_>ES4y^WRmb^!oFSKyE6;GfDtUk=sQ#C=8<1$IRN zH{|fhWPy9iq4QZ@Cs~N`L({!$cocxp2cp4Mi*B%p5R!dA5jWt&BK+&3frLlFxwavt z$Ux*^pwckg`erDBC}1{=oVW=8(LLZ5fiRiMAPHosrZ14#6|eD8_#Fv9tj#ACWyAYR zb`J$UCvgL>_kLnE1yC@CMNvn<6+f5Te14|lw`~ewlf`2;#phZIdkps$x$0^*6f!^& zfUV#U2-OdN?-Q=*6RAlZIiw%<-YX2bc-X^#M+PK}=P+-$zkxwu}WwgkGjZueCglqCbbXMZ@7S=Vft)BZTOwxOJjR zCcZ67w&+csfrno65B3mBopKoV2Oj?J{h! zkZzhNzUpYNrD&$sB=k|tAj{`$nzUS%G&EIg!$LA(G-Y8V?N~XIi6uoeJNCtB(mQ_w z?-*+tND@RfrnfEi)naPL)ig!P3=!6}_puqu#2K*Gl+5f5n19Ag)pR{c6LAA4N!k>F z_857|$a3G*#gTNiSU;m`KM&P72mdSv)d9r6g>O%NJ#dEM9$6QOn2g}55&3O1adOk(==FfHSIG&(V198b#^jw zc71G4vR`&VR<>VkLJMn>RrWi#_RO~Gv=D#$YGSXNoT$2-oV#r~nai1-w7G@H`Q6o7 zW&Sw{w0ZNac|VV{hgEa<`3rP&3Q}V8%W3l$R13$J3gU?im#TAi?DMzU-3M6xe$b}( zvjm0Qdrm4(+4cquo(V!5Q3HiPTKCmu{8qg2sDHoJQrK@S2-_%Nl9pMDZB z{QOj^;Ml)F#<1|Ty-cnn|A;G9(V<-V12=EAkP4<;Q&+0V z<0`3Qsc6`4v=c2P&_v8exbkcc@0PpgIg{;yY@@qypFiYI+RI z1V0q_omA(?)iwp7zO7UaqiQGVDoszK%|9UP4eP$l|^jT+azUG zsDcuK+PGYFf&&^Gi~b>n>guQ+`hcD?te>Dm&0oH zi`Cu~Y-k|)c4MsJu%q$&Q2lMSQZ~W*4wSSwX@g{7BfDC)3~3{eW7AGR6O&p!#IasI zzMk`>u7R*ZN3B^MRj&K7*`TJG#lLyseg;sDlMd1pPh-NERr9{)n*(>t8zU?0z_O2l znTl^>9Xru%2Q6U723xw8OYd4Ao_hI?Wr)u=4#5*2oHiy>w0YNi#r92^1xO44P;z##_5$v>l2gdytrc}dQz_i4$76GqC6euA;l$5+VA5jLXn9J# zFAC^A#p|uXzm(mv+g?ZK2nw@OA$*`r7y%eJlD*;zc&X4`5a0AAth@9!-g-PaffoL% zYgd{Cl*p^+$`8eZ8vF>W77%-z(`_1pAaVd8oJbl#Lf;b@-x(9A)?VJjQm;-S1Q_-1 zv*K+9ssfig`xt`A*=YKcYg#R4dnue`S?QJ8>ig&7TYtv)Esl}1wtXT}@8!+MC%6{2 z89#7e9YI{KKE9`ZSyqz;(8t25K@p^Ni@p^_Jb>-3?pv(yUUJ3T$Rj7^a1bQ>eiYb$ zd#Z<{w$1m|;79mipI{g3ROb+T|Bhq(gCM!`2wAlR{7EC(%beZNpw8{n26I6maW8;c z6rd@9M`a42>J8(UZhP_O2dTpFrH6QIp@4`cz*lxW)4iUT^u42o{*mmpJMHbI0bk`w zlZx|ON{tZ!V~l1VWA^QDFG-soof+1uHzn3~CdH3QsEwr$4!<&PHM%~QL^k}T6O+G& zDI~>ojkJ9w8|`r#i)XKQh{qtE%F4#S%{sM3jE&FLVmhyk7pygjtxc?|k1r*(*mR98 zJRfJs>+DLHRLvR3C!;+dA6vaQ1rHdT;E2KQb6|2-$6_8%5}c3L8&9C~qlh&ccgULe zYRAaVCq0bEM!Kdp*4iFH#{n{9bahRXGLx5rrZH<7921RwXRXd=Q{NJ%hS&Q0nwv!# z=3Ew=$?ng+cAmo%n~TG5amybFU7KSA4IlcpTq2*I%%5rGm}fN;fKCe~~e40P6C)}@}IN#4Y_}zahcb&# z3zp1;=c$Z`Xh6%M!b`kIOWbtJY0eW^{`2J+=ULr?Wz+K&>AIx{brXsj<0cHNxuA&? z`Wdh8)oeuVr-_Bs#1-4bntYk|V$j;i`C{VwaQ*q**Y%aw^ELQHocsDr)A@XS-Fh$G z+E~F_3B!h)^ICr3G6J+dmAKkix6151-;%g77rc1TzH%PCF<-E8C1mL(yxnJFY4d#3 zNM^Gcv59|Yll;z>DEi z-J(J-P6)570yjUnY~Jcw7oIT2R>-VHYV7EuSGZkPs%7>E&Kvnnmfr(+n9)0DY+DMz zJx%O}1{w+P*~d8VJ5R1zgAdF?w)YEuicHR1OzzlgZmDk^0;ESZmEA`^*j=m1gD>dAG?8Qd^ZiE5k!9hr9k@iLXRi!>P!2q*NIIzuImvU$ zRs`-xx}4l3KQ?qZ_QD<~iR`6lZZ;I2$I8(X^?n5;W{VQaqwK-Gj527@fC66u?<*i%5* z`js`(tJi_okA`Un7`m8R9@Y|2a^-{Y*lHmt;i~7z*ZGpUGU@Vd#CUu>&=`#^#U51w z6RbgkIox5z`iJv@_esgeFjxqgM^r4vpFZ_&=7U_8LUiIz2}(CS9PLMmoUVS>^J+k1 zNE#zIsW6-JtpU1{-nL-8cS3VuccgvA^s2Cf|JVEPD?_Kh-Y2{Jnsi6@<16o*9J-zE za@wzh>DinKXyoQvy&p(J)WYOv8$8&B3Ouh~pMS&7YdjO5P4Vz;r0_d@UY5`4QVvfZ z-~(Jgr^siJBnV4aQY#qT%27~I%g<3@9m>?mrM>JEVWd*3*O#GyiZp&vY%<*(@iYpd zj})LryURJSMT**lF* z#+M7dnR1z++x2AzLxs=*7=4_zF{W5|a 0.0 + push!(ps, s) + push!(pm, sp) + push!(wm, weight_sp) + end + end + end + + while length(pm) < up.n_init + a_pert = RoombaAct(a.v + (up.v_noise_coeff * (rand(up.rng) - 0.5)), a.omega + (up.om_noise_coeff * (rand(up.rng) - 0.5))) + s = isempty(ps) ? rand(up.rng, b) : rand(up.rng, ps) + sp = @gen(:sp)(up.model, s, a_pert, up.rng) + weight_sp = obs_weight(up.model, s, a_pert, sp, o) + if weight_sp > 0.0 + push!(pm, sp) + push!(wm, weight_sp) + end + end + + # if all particles are terminal, issue an error + if all_terminal + error("Particle filter update error: all states in the particle collection were terminal.") + end + + # return ParticleFilters.ParticleCollection(deepcopy(pm)) + return ParticleFilters.resample(up.resampler, + WeightedParticleBelief(pm, wm, sum(wm), nothing), + up.rng) +end + +solver = POMCPSolver(; + tree_queries=20000, + max_depth=150, + c = 10.0, + rng=MersenneTwister(1) +) + +planner = solve(solver, pomdp) + +sim = GifSimulator(; + filename="examples/EscapeRoomba.gif", + max_steps=100, + rng=MersenneTwister(3), + show_progress=false, + fps=5) +saved_gif = simulate(sim, pomdp, planner, belief_updater) + +println("gif saved to: $(saved_gif.filename)") +``` + +```@setup EscapeRoomba +Pkg.rm("RoombaPOMDPs") +``` + +## [DroneSurveillance](https://github.com/JuliaPOMDP/DroneSurveillance.jl) +Drone surveillance POMDP from M. Svoreňová, M. Chmelík, K. Leahy, H. F. Eniser, K. Chatterjee, I. Černá, C. Belta, "Temporal logic motion planning using POMDPs with parity objectives: case study paper", International Conference on Hybrid Systems: Computation and Control (HSCC), 2015. + +In this problem, the UAV must go from one corner to the other while avoiding a ground agent. It can only detect the ground agent within its field of view (in blue). + +![DroneSurveillance](examples/DroneSurveillance.gif) + +```@example +using POMDPs +using POMDPTools +using POMDPGifs +using NativeSARSOP +using Random +using DroneSurveillance +import Cairo, Fontconfig + +pomdp = DroneSurveillancePOMDP() +solver = SARSOPSolver(; precision=0.1, max_time=10.0) +policy = solve(solver, pomdp) + +sim = GifSimulator(; filename="examples/DroneSurveillance.gif", max_steps=30, rng=MersenneTwister(1), show_progress=false) +saved_gif = simulate(sim, pomdp, policy) + +println("gif saved to: $(saved_gif.filename)") +``` + +## [QuickMountainCar](https://github.com/JuliaPOMDP/QuickPOMDPs.jl) +An implementation of the classic Mountain Car RL problem using the QuickPOMDPs interface. + +![QuickMountainCar](examples/QuickMountainCar.gif) + +```@example +using POMDPs +using POMDPTools +using POMDPGifs +using Random +using QuickPOMDPs +using Compose +import Cairo + +mountaincar = QuickMDP( + function (s, a, rng) + x, v = s + vp = clamp(v + a*0.001 + cos(3*x)*-0.0025, -0.07, 0.07) + xp = x + vp + if xp > 0.5 + r = 100.0 + else + r = -1.0 + end + return (sp=(xp, vp), r=r) + end, + actions = [-1., 0., 1.], + initialstate = Deterministic((-0.5, 0.0)), + discount = 0.95, + isterminal = s -> s[1] > 0.5, + + render = function (step) + cx = step.s[1] + cy = 0.45*sin(3*cx)+0.5 + car = (context(), Compose.circle(cx, cy+0.035, 0.035), fill("blue")) + track = (context(), line([(x, 0.45*sin(3*x)+0.5) for x in -1.2:0.01:0.6]), Compose.stroke("black")) + goal = (context(), star(0.5, 1.0, -0.035, 5), fill("gold"), Compose.stroke("black")) + bg = (context(), Compose.rectangle(), fill("white")) + ctx = context(0.7, 0.05, 0.6, 0.9, mirror=Mirror(0, 0, 0.5)) + return compose(context(), (ctx, car, track, goal), bg) + end +) + +energize = FunctionPolicy(s->s[2] < 0.0 ? -1.0 : 1.0) +sim = GifSimulator(; filename="examples/QuickMountainCar.gif", max_steps=200, fps=20, rng=MersenneTwister(1), show_progress=false) +saved_gif = simulate(sim, mountaincar, energize) + +println("gif saved to: $(saved_gif.filename)") +``` + +## [RockSample](https://github.com/JuliaPOMDP/RockSample.jl) +The RockSample problem problem from T. Smith, R. Simmons, "Heuristic Search Value Iteration for POMDPs", Association for Uncertainty in Artificial Intelligence (UAI), 2004. + +The robot must navigate and sample good rocks (green) and then arrive at an exit area. The robot can only sense the rocks with an imperfect sensor that has performance that depends on the distance to the rock. + +![RockSample](examples/RockSample.gif) + +```@example +using POMDPs +using POMDPTools +using POMDPGifs +using NativeSARSOP +using Random +using RockSample +using Cairo + +pomdp = RockSamplePOMDP(rocks_positions=[(2,3), (4,4), (4,2)], + sensor_efficiency=20.0, + discount_factor=0.95, + good_rock_reward = 20.0) + +solver = SARSOPSolver(precision=1e-3; max_time=10.0) +policy = solve(solver, pomdp) + +sim = GifSimulator(; filename="examples/RockSample.gif", max_steps=30, rng=MersenneTwister(1), show_progress=false) +saved_gif = simulate(sim, pomdp, policy) + +println("gif saved to: $(saved_gif.filename)") +``` + +## [TagPOMDPProblem](https://github.com/JuliaPOMDP/TagPOMDPProblem.jl) +The Tag problem from J. Pineau, G. Gordon, and S. Thrun, "Point-based value iteration: An anytime algorithm for POMDPs", International Joint Conference on Artificial Intelligence (IJCAI), 2003. + +The orange agent is the pursuer and the red agent is the evader. The pursuer must "tag" the evader by being in the same grid cell as the evader. However, the pursuer can only see the evader if it is in the same grid cell as the evader. The evader moves stochastically "away" from the pursuer. + +![TagPOMDPProblem](examples/TagPOMDP.gif) + +```@setup TagPOMDP +using Pkg +Pkg.add("Plots") +using Plots +``` + +```@example TagPOMDP +using POMDPs +using POMDPTools +using POMDPGifs +using NativeSARSOP +using Random +using TagPOMDPProblem + +pomdp = TagPOMDP() +solver = SARSOPSolver(; max_time=20.0) +policy = solve(solver, pomdp) +sim = GifSimulator(; filename="examples/TagPOMDP.gif", max_steps=50, rng=MersenneTwister(1), show_progress=false) +saved_gif = simulate(sim, pomdp, policy) + +println("gif saved to: $(saved_gif.filename)") +``` + +```@setup TagPOMDP +using Pkg +Pkg.rm("Plots") +``` + +## Adding New Gallery Examples +To add new examples, please submit a pull request to the POMDPs.jl repository with changes made to the `gallery.md` file in `docs/src/`. Please include the creation of a gif in the code snippet. The gif should be generated during the creation of the documenation using `@eval` and saved in the `docs/src/examples/` directory. The gif should be named `problem_name.gif` where `problem_name` is the name of the problem. The gif can then be included using `![problem_name](examples/problem_name.gif)`. \ No newline at end of file diff --git a/docs/src/get_started.md b/docs/src/get_started.md index a4b670ef..14c40811 100644 --- a/docs/src/get_started.md +++ b/docs/src/get_started.md @@ -29,4 +29,4 @@ POMDP policy. Lastly, we evaluate the results. There are a few things to mention here. First, the TigerPOMDP type implements all the functions required by QMDPSolver to compute a policy. Second, each policy has a default updater (essentially a filter used to update the -belief of the POMDP). To learn more about Updaters check out the [Concepts](http://juliapomdp.github.io/POMDPs.jl/latest/concepts/) section. +belief of the POMDP). To learn more about Updaters check out the [Concepts and Architecture](@ref) section. diff --git a/docs/src/index.md b/docs/src/index.md index e9b82e32..7d162e36 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -18,7 +18,7 @@ The list of solver and support packages maintained by the [JuliaPOMDP](https://g Documentation comes in three forms: 1. An explanatory guide is available in the sections outlined below. -2. How-to examples are available in pages in this document with "Example" in the title and in the [POMDPExamples package](https://github.com/JuliaPOMDP/POMDPExamples.jl). +2. How-to examples are available throughout this documentation with specicic examples in [Examples](@ref examples_section) and [Gallery of POMDPs.jl Problems](@ref). 3. Reference docstrings for the entire POMDPs.jl interface are available in the [API Documentation](@ref) section. !!! note @@ -49,6 +49,12 @@ Pages = [ "def_solver.md", "offline_solver.md", "online_solver.md", "def_updater Pages = [ "simulation.md", "run_simulation.md", "policy_interaction.md" ] ``` +### Examples and Gallery + +```@contents +Pages = [ "examples.md", "example_defining_problems.md", "example_solvers.md", "example_simulations.md", "example_gridworld_mdp.md", "gallery.md"] +``` + ### POMDPTools - the standard library for POMDPs.jl ```@contents diff --git a/docs/src/run_simulation.md b/docs/src/run_simulation.md index 7a849a44..082ecf71 100644 --- a/docs/src/run_simulation.md +++ b/docs/src/run_simulation.md @@ -9,4 +9,4 @@ r = simulate(sim, m, p) More inputs, such as a belief updater, initial state, initial belief, etc. may be specified as arguments to [`simulate`](@ref). See the docstring for [`simulate`](@ref) and the appropriate "Input" sections in the [Simulation Standard](@ref) page for more information. -More examples can be found in the [POMDPExamples package](https://github.com/JuliaPOMDP/POMDPExamples.jl). A variety of simulators that return more information and interact in different ways can be found in [POMDPTools](@ref pomdptools_section). +More examples can be found in the [Simulations Examples](@ref) section. A variety of simulators that return more information and interact in different ways can be found in [POMDPTools](@ref pomdptools_section).