diff --git a/docs/src/example_defining_problems.md b/docs/src/example_defining_problems.md index e7f2d173..be243fe4 100644 --- a/docs/src/example_defining_problems.md +++ b/docs/src/example_defining_problems.md @@ -1,13 +1,13 @@ # Defining a POMDP -As mentioned in the [Defining POMDPs and MDPs](@ref defining_pomdps) section, there are verious ways to define a POMDP using POMDPs.jl. In this section, we provide more examples of how to define a POMDP using the different interfaces. +As mentioned in the [Defining POMDPs and MDPs](@ref defining_pomdps) section, there are various ways to define a POMDP using POMDPs.jl. In this section, we provide more examples of how to define a POMDP using the different interfaces. -There is a large variety of problems that can be expressed as MDPs and POMDPs and different solvers require different components of the POMDPs.jl interface to be defined. Therefore, these examples are not intended to cover all possible use cases. When deeloping a problem and you have an idea of what solver(s) you would like to use, it is recommended to use [POMDPLinter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help you to determine what components of the POMDPs.jl interface need to be defined. Reference the [Checking Requirements](@ref) section for an example of using POMDPLinter. +There is a large variety of problems that can be expressed as MDPs and POMDPs and different solvers require different components of the POMDPs.jl interface to be defined. Therefore, these examples are not intended to cover all possible use cases. When developing a problem and you have an idea of what solver(s) you would like to use, it is recommended to use [POMDPLinter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help you to determine what components of the POMDPs.jl interface need to be defined. Reference the [Checking Requirements](@ref) section for an example of using POMDPLinter. ## CryingBaby Problem Definition For the examples, we will use the CryingBaby problem from [Algorithms for Decision Making](https://algorithmsbook.com/) by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray. !!! note - This craying baby problem follows the description in Algorithms for Decision Making and is different than `BabyPOMDP` defined in [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl). + This crying baby problem follows the description in Algorithms for Decision Making and is different than `BabyPOMDP` defined in [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl). From [Appendix F](https://algorithmsbook.com/files/appendix-f.pdf) of Algorithms for Decision Making: > The crying baby problem is a simple POMDP with two states, three actions, and two observations. Our goal is to care for a baby, and we do so by choosing at each time step whether to feed the baby, sing to the baby, or ignore the baby. @@ -201,7 +201,7 @@ explicit_crying_baby_pomdp = CryingBabyPOMDP() ``` ## [Generative Interface](@id gen_crying) -This crying baby problem should not be implemented using the generative interface. However, this exmple is provided for pedagogical purposes. +This crying baby problem should not be implemented using the generative interface. However, this example is provided for pedagogical purposes. ```julia using POMDPs @@ -273,7 +273,7 @@ gen_crying_baby_pomdp = GenCryingBabyPOMDP() ``` ## [Probability Tables](@id tab_crying) -For this implementaion we will use the following indexes: +For this implementation we will use the following indexes: - States - `:sated` = 1 - `:hungry` = 2 diff --git a/docs/src/example_gridworld_mdp.md b/docs/src/example_gridworld_mdp.md index 4dfa4e48..b8ea4968 100644 --- a/docs/src/example_gridworld_mdp.md +++ b/docs/src/example_gridworld_mdp.md @@ -1,6 +1,6 @@ # GridWorld MDP Tutorial -In this tutorial, we provide a simple example of how to define a Markov decision process (MDP) using the POMDPS.jl interface. We will then solve the MDP using value iteration and Monte Carlo tree search (MCTS). We will walk through constructing the MDP using the explicit interface which invovles defining a new type for the MDP and then extending different components of the POMDPs.jl interface for that type. +In this tutorial, we provide a simple example of how to define a Markov decision process (MDP) using the POMDPS.jl interface. We will then solve the MDP using value iteration and Monte Carlo tree search (MCTS). We will walk through constructing the MDP using the explicit interface which involves defining a new type for the MDP and then extending different components of the POMDPs.jl interface for that type. ## Dependencies @@ -32,7 +32,7 @@ using MCTS ## Problem Overview -In Grid World, we are trying to control an agent who has trouble moving in the desired direction. In our problem, we have four reward states within the a grid. Each position on the grid represents a state, and the positive reward states are terminal (the agent stops recieving reward after reaching them and performing an action from that state). The agent has four actions to choose from: up, down, left, right. The agent moves in the desired direction with a probability of $0.7$, and with a probability of $0.1$ in each of the remaining three directions. If the agent bumps into the outside wall, there is a penalty of $1$ (i.e. reward of $-1$). The problem has the following form: +In Grid World, we are trying to control an agent who has trouble moving in the desired direction. In our problem, we have four reward states within the a grid. Each position on the grid represents a state, and the positive reward states are terminal (the agent stops receiving reward after reaching them and performing an action from that state). The agent has four actions to choose from: up, down, left, right. The agent moves in the desired direction with a probability of $0.7$, and with a probability of $0.1$ in each of the remaining three directions. If the agent bumps into the outside wall, there is a penalty of $1$ (i.e. reward of $-1$). The problem has the following form: ![Grid World](examples/grid_world_overview.gif) @@ -79,7 +79,7 @@ struct GridWorldMDP <: MDP{GridWorldState, Symbol} reward_states_values::Dict{GridWorldState, Float64} # Dictionary mapping reward states to their values hit_wall_reward::Float64 # reward for hitting a wall tprob::Float64 # probability of transitioning to the desired state - discount_factor::Float64 # disocunt factor + discount_factor::Float64 # discount factor end ``` @@ -126,7 +126,7 @@ mdp = GridWorldMDP() ``` !!! note - In this definition of the problem, our coordiates start in the bottom left of the grid. That is GridState(1, 1) is the bottom left of the grid and GridState(10, 10) would be on the right of the grid with a grid size of 10 by 10. + In this definition of the problem, our coordinates start in the bottom left of the grid. That is GridState(1, 1) is the bottom left of the grid and GridState(10, 10) would be on the right of the grid with a grid size of 10 by 10. ## Grid World State Space The state space in an MDP represents all the states in the problem. There are two primary functionalities that we want our spaces to support. We want to be able to iterate over the state space (for Value Iteration for example), and sometimes we want to be able to sample form the state space (used in some POMDP solvers). In this notebook, we will only look at iterable state spaces. @@ -236,7 +236,7 @@ Similar to above, let's iterate over a few of the states in our state space: ``` ## Grid World Action Space -The action space is the set of all actions availiable to the agent. In the grid world problem the action space consists of up, down, left, and right. We can define the action space by implementing a new method of the actions function. +The action space is the set of all actions available to the agent. In the grid world problem the action space consists of up, down, left, and right. We can define the action space by implementing a new method of the actions function. ```@example gridworld_mdp POMDPs.actions(mdp::GridWorldMDP) = [:up, :down, :left, :right] @@ -255,7 +255,7 @@ end ## Grid World Transition Function MDPs often define the transition function as $T(s^{\prime} \mid s, a)$, which is the probability of transitioning to state $s^{\prime}$ given that we are in state $s$ and take action $a$. For the POMDPs.jl interface, we define the transition function as a distribution over the next states. That is, we want $T(\cdot \mid s, a)$ which is a function that takes in a state and an action and returns a distribution over the next states. -For our grid world example, there are only a few states to which the agent can transition and thus only a few states with nonzero probaility in $T(\cdot \mid s, a)$. We can use the `SparseCat` distribution to represent this. The `SparseCat` distribution is a categorical distribution that only stores the nonzero probabilities. We can define our transition function as follows: +For our grid world example, there are only a few states to which the agent can transition and thus only a few states with nonzero probability in $T(\cdot \mid s, a)$. We can use the `SparseCat` distribution to represent this. The `SparseCat` distribution is a categorical distribution that only stores the nonzero probabilities. We can define our transition function as follows: ```@example gridworld_mdp function POMDPs.transition(mdp::GridWorldMDP, s::GridWorldState, a::Symbol) @@ -413,7 +413,7 @@ We are almost done! We still need to define `discount`. Let's first use `POMDPLi using POMDPLinter @show_requirements POMDPs.solve(ValueIterationSolver(), mdp) - +nothing # hide ``` As we expected, we need to define `discount`. @@ -428,7 +428,7 @@ Let's check again: ```@example gridworld_mdp @show_requirements POMDPs.solve(ValueIterationSolver(), mdp) - +nothing # hide ``` ## Solving the Grid World MDP (Value Iteration) diff --git a/docs/src/example_simulations.md b/docs/src/example_simulations.md index cd6b5e95..c1cc0d0a 100644 --- a/docs/src/example_simulations.md +++ b/docs/src/example_simulations.md @@ -9,7 +9,7 @@ include("examples/crying_baby_solvers.jl") ``` ## Stepthrough -The stepthrough simulater provides a window into the simulation with a for-loop syntax. +The stepthrough simulator provides a window into the simulation with a for-loop syntax. Within the body of the for loop, we have access to the belief, the action, the observation, and the reward, in each step. We also calculate the sum of the rewards in this example, but note that this is _not_ the _discounted reward_. @@ -58,7 +58,7 @@ history = simulate(hr, tabular_crying_baby_pomdp, policy, DiscreteUpdater(tabula nothing # hide ``` -The history object produced by a `HistoryRecorder` is a `SimHistory`, documented in the POMDPTools simulater section [Histories](@ref). The information in this object can be accessed in several ways. For example, there is a function: +The history object produced by a `HistoryRecorder` is a `SimHistory`, documented in the POMDPTools simulator section [Histories](@ref). The information in this object can be accessed in several ways. For example, there is a function: ```@example crying_sim discounted_reward(history) ``` @@ -97,7 +97,7 @@ demo_eachstep(history) # hide ## Parallel Simulations It is often useful to evaluate a policy by running many simulations. The parallel simulator is the most effective tool for this. To use the parallel simulator, first create a list of `Sim` objects, each of which contains all of the information needed to run a simulation. Then then run the simulations using `run_parallel`, which will return a `DataFrame` with the results. -In this example, we will compare the performance of the polcies we computed in the [Using Different Solvers](@ref) section (i.e. `sarsop_policy`, `pomcp_planner`, and `heuristic_policy`). To evaluate the policies, we will run 100 simulations for each policy. We can do this by adding 100 `Sim` objects of each policy to the list. +In this example, we will compare the performance of the policies we computed in the [Using Different Solvers](@ref) section (i.e. `sarsop_policy`, `pomcp_planner`, and `heuristic_policy`). To evaluate the policies, we will run 100 simulations for each policy. We can do this by adding 100 `Sim` objects of each policy to the list. ```@example crying_sim using DataFrames diff --git a/docs/src/example_solvers.md b/docs/src/example_solvers.md index 069053a7..cee92985 100644 --- a/docs/src/example_solvers.md +++ b/docs/src/example_solvers.md @@ -37,12 +37,12 @@ end ``` ## Offline (SARSOP) -In this example, we will use the [NativeSARSOP](https://github.com/JuliaPOMDP/NativeSARSOP.jl) solver. The process for generating offline polcies is similar for all offline solvers. First, we define the solver with the desired parameters. Then, we call `POMDPs.solve` with the solver and the problem. We can query the policy using the `action` function. +In this example, we will use the [NativeSARSOP](https://github.com/JuliaPOMDP/NativeSARSOP.jl) solver. The process for generating offline polices is similar for all offline solvers. First, we define the solver with the desired parameters. Then, we call `POMDPs.solve` with the solver and the problem. We can query the policy using the `action` function. ```@example crying_sim using NativeSARSOP -# Define the solver with the desired paramters +# Define the solver with the desired parameters sarsop_solver = SARSOPSolver(; max_time=10.0) # Solve the problem by calling POMDPs.solve. SARSOP will compute the policy and return an `AlphaVectorPolicy` @@ -57,7 +57,7 @@ a = action(sarsop_policy, b) ``` ## Online (POMCP) -For the online solver, we will use Particle Monte Carlo Planning ([POMCP](https://github.com/JuliaPOMDP/BasicPOMCP.jl)). For online solvers, we first define the solver similar to offline solvers. However, when we call `POMDPs.solve`, we are returned an online plannner. Similar to the offline solver, we can query the policy using the `action` function and that is when the online solver will compute the action. +For the online solver, we will use Particle Monte Carlo Planning ([POMCP](https://github.com/JuliaPOMDP/BasicPOMCP.jl)). For online solvers, we first define the solver similar to offline solvers. However, when we call `POMDPs.solve`, we are returned an online planner. Similar to the offline solver, we can query the policy using the `action` function and that is when the online solver will compute the action. ```@example crying_sim using BasicPOMCP @@ -73,7 +73,7 @@ a = action(pomcp_planner, b) ``` ## Heuristic Policy -While we often want to use a solver to compute a policy, sometimes we might want to use a heuristic policy. For example, we may want to use a heuristic policy during our rollouts for online solvers or to use as a baseline. In this example, we will define a simple heuristic policy that feeds the baby if our belief of the baby being hungry is greater than 50%, otherwise we will randomly ignore or sing to the baby. +While we often want to use a solver to compute a policy, sometimes we might want to use a heuristic policy. For example, we may want to use a heuristic policy during our rollout for online solvers or to use as a baseline. In this example, we will define a simple heuristic policy that feeds the baby if our belief of the baby being hungry is greater than 50%, otherwise we will randomly ignore or sing to the baby. ```@example crying_sim struct HeuristicFeedPolicy{P<:POMDP} <: Policy diff --git a/docs/src/gallery.md b/docs/src/gallery.md index 39997852..6823b717 100644 --- a/docs/src/gallery.md +++ b/docs/src/gallery.md @@ -196,7 +196,7 @@ println("gif saved to: $(saved_gif.filename)") ``` ## [RockSample](https://github.com/JuliaPOMDP/RockSample.jl) -The RockSample problem problem from T. Smith, R. Simmons, "Heuristic Search Value Iteration for POMDPs", Association for Uncertainty in Artificial Intelligence (UAI), 2004. +The RockSample problem from T. Smith, R. Simmons, "Heuristic Search Value Iteration for POMDPs", Association for Uncertainty in Artificial Intelligence (UAI), 2004. The robot must navigate and sample good rocks (green) and then arrive at an exit area. The robot can only sense the rocks with an imperfect sensor that has performance that depends on the distance to the rock. @@ -261,4 +261,4 @@ Pkg.rm("Plots") ``` ## Adding New Gallery Examples -To add new examples, please submit a pull request to the POMDPs.jl repository with changes made to the `gallery.md` file in `docs/src/`. Please include the creation of a gif in the code snippet. The gif should be generated during the creation of the documenation using `@eval` and saved in the `docs/src/examples/` directory. The gif should be named `problem_name.gif` where `problem_name` is the name of the problem. The gif can then be included using `![problem_name](examples/problem_name.gif)`. \ No newline at end of file +To add new examples, please submit a pull request to the POMDPs.jl repository with changes made to the `gallery.md` file in `docs/src/`. Please include the creation of a gif in the code snippet. The gif should be generated during the creation of the documentation using `@eval` and saved in the `docs/src/examples/` directory. The gif should be named `problem_name.gif` where `problem_name` is the name of the problem. The gif can then be included using `![problem_name](examples/problem_name.gif)`. \ No newline at end of file