Skip to content

Commit

Permalink
fixed typos. (#557)
Browse files Browse the repository at this point in the history
* fixed the typos mentioned and a few others.

* removing history

* Fixed linter output by adding a hidden nothing to force stdout
  • Loading branch information
MaderDash authored Jul 22, 2024
1 parent 6ffa564 commit 4d0b542
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 22 deletions.
10 changes: 5 additions & 5 deletions docs/src/example_defining_problems.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Defining a POMDP
As mentioned in the [Defining POMDPs and MDPs](@ref defining_pomdps) section, there are verious ways to define a POMDP using POMDPs.jl. In this section, we provide more examples of how to define a POMDP using the different interfaces.
As mentioned in the [Defining POMDPs and MDPs](@ref defining_pomdps) section, there are various ways to define a POMDP using POMDPs.jl. In this section, we provide more examples of how to define a POMDP using the different interfaces.

There is a large variety of problems that can be expressed as MDPs and POMDPs and different solvers require different components of the POMDPs.jl interface to be defined. Therefore, these examples are not intended to cover all possible use cases. When deeloping a problem and you have an idea of what solver(s) you would like to use, it is recommended to use [POMDPLinter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help you to determine what components of the POMDPs.jl interface need to be defined. Reference the [Checking Requirements](@ref) section for an example of using POMDPLinter.
There is a large variety of problems that can be expressed as MDPs and POMDPs and different solvers require different components of the POMDPs.jl interface to be defined. Therefore, these examples are not intended to cover all possible use cases. When developing a problem and you have an idea of what solver(s) you would like to use, it is recommended to use [POMDPLinter](https://github.com/JuliaPOMDP/POMDPLinter.jl) to help you to determine what components of the POMDPs.jl interface need to be defined. Reference the [Checking Requirements](@ref) section for an example of using POMDPLinter.

## CryingBaby Problem Definition
For the examples, we will use the CryingBaby problem from [Algorithms for Decision Making](https://algorithmsbook.com/) by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray.

!!! note
This craying baby problem follows the description in Algorithms for Decision Making and is different than `BabyPOMDP` defined in [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl).
This crying baby problem follows the description in Algorithms for Decision Making and is different than `BabyPOMDP` defined in [POMDPModels.jl](https://github.com/JuliaPOMDP/POMDPModels.jl).

From [Appendix F](https://algorithmsbook.com/files/appendix-f.pdf) of Algorithms for Decision Making:
> The crying baby problem is a simple POMDP with two states, three actions, and two observations. Our goal is to care for a baby, and we do so by choosing at each time step whether to feed the baby, sing to the baby, or ignore the baby.
Expand Down Expand Up @@ -201,7 +201,7 @@ explicit_crying_baby_pomdp = CryingBabyPOMDP()
```

## [Generative Interface](@id gen_crying)
This crying baby problem should not be implemented using the generative interface. However, this exmple is provided for pedagogical purposes.
This crying baby problem should not be implemented using the generative interface. However, this example is provided for pedagogical purposes.

```julia
using POMDPs
Expand Down Expand Up @@ -273,7 +273,7 @@ gen_crying_baby_pomdp = GenCryingBabyPOMDP()
```

## [Probability Tables](@id tab_crying)
For this implementaion we will use the following indexes:
For this implementation we will use the following indexes:
- States
- `:sated` = 1
- `:hungry` = 2
Expand Down
16 changes: 8 additions & 8 deletions docs/src/example_gridworld_mdp.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GridWorld MDP Tutorial

In this tutorial, we provide a simple example of how to define a Markov decision process (MDP) using the POMDPS.jl interface. We will then solve the MDP using value iteration and Monte Carlo tree search (MCTS). We will walk through constructing the MDP using the explicit interface which invovles defining a new type for the MDP and then extending different components of the POMDPs.jl interface for that type.
In this tutorial, we provide a simple example of how to define a Markov decision process (MDP) using the POMDPS.jl interface. We will then solve the MDP using value iteration and Monte Carlo tree search (MCTS). We will walk through constructing the MDP using the explicit interface which involves defining a new type for the MDP and then extending different components of the POMDPs.jl interface for that type.

## Dependencies

Expand Down Expand Up @@ -32,7 +32,7 @@ using MCTS

## Problem Overview

In Grid World, we are trying to control an agent who has trouble moving in the desired direction. In our problem, we have four reward states within the a grid. Each position on the grid represents a state, and the positive reward states are terminal (the agent stops recieving reward after reaching them and performing an action from that state). The agent has four actions to choose from: up, down, left, right. The agent moves in the desired direction with a probability of $0.7$, and with a probability of $0.1$ in each of the remaining three directions. If the agent bumps into the outside wall, there is a penalty of $1$ (i.e. reward of $-1$). The problem has the following form:
In Grid World, we are trying to control an agent who has trouble moving in the desired direction. In our problem, we have four reward states within the a grid. Each position on the grid represents a state, and the positive reward states are terminal (the agent stops receiving reward after reaching them and performing an action from that state). The agent has four actions to choose from: up, down, left, right. The agent moves in the desired direction with a probability of $0.7$, and with a probability of $0.1$ in each of the remaining three directions. If the agent bumps into the outside wall, there is a penalty of $1$ (i.e. reward of $-1$). The problem has the following form:

![Grid World](examples/grid_world_overview.gif)

Expand Down Expand Up @@ -79,7 +79,7 @@ struct GridWorldMDP <: MDP{GridWorldState, Symbol}
reward_states_values::Dict{GridWorldState, Float64} # Dictionary mapping reward states to their values
hit_wall_reward::Float64 # reward for hitting a wall
tprob::Float64 # probability of transitioning to the desired state
discount_factor::Float64 # disocunt factor
discount_factor::Float64 # discount factor
end
```

Expand Down Expand Up @@ -126,7 +126,7 @@ mdp = GridWorldMDP()
```

!!! note
In this definition of the problem, our coordiates start in the bottom left of the grid. That is GridState(1, 1) is the bottom left of the grid and GridState(10, 10) would be on the right of the grid with a grid size of 10 by 10.
In this definition of the problem, our coordinates start in the bottom left of the grid. That is GridState(1, 1) is the bottom left of the grid and GridState(10, 10) would be on the right of the grid with a grid size of 10 by 10.

## Grid World State Space
The state space in an MDP represents all the states in the problem. There are two primary functionalities that we want our spaces to support. We want to be able to iterate over the state space (for Value Iteration for example), and sometimes we want to be able to sample form the state space (used in some POMDP solvers). In this notebook, we will only look at iterable state spaces.
Expand Down Expand Up @@ -236,7 +236,7 @@ Similar to above, let's iterate over a few of the states in our state space:
```

## Grid World Action Space
The action space is the set of all actions availiable to the agent. In the grid world problem the action space consists of up, down, left, and right. We can define the action space by implementing a new method of the actions function.
The action space is the set of all actions available to the agent. In the grid world problem the action space consists of up, down, left, and right. We can define the action space by implementing a new method of the actions function.

```@example gridworld_mdp
POMDPs.actions(mdp::GridWorldMDP) = [:up, :down, :left, :right]
Expand All @@ -255,7 +255,7 @@ end
## Grid World Transition Function
MDPs often define the transition function as $T(s^{\prime} \mid s, a)$, which is the probability of transitioning to state $s^{\prime}$ given that we are in state $s$ and take action $a$. For the POMDPs.jl interface, we define the transition function as a distribution over the next states. That is, we want $T(\cdot \mid s, a)$ which is a function that takes in a state and an action and returns a distribution over the next states.

For our grid world example, there are only a few states to which the agent can transition and thus only a few states with nonzero probaility in $T(\cdot \mid s, a)$. We can use the `SparseCat` distribution to represent this. The `SparseCat` distribution is a categorical distribution that only stores the nonzero probabilities. We can define our transition function as follows:
For our grid world example, there are only a few states to which the agent can transition and thus only a few states with nonzero probability in $T(\cdot \mid s, a)$. We can use the `SparseCat` distribution to represent this. The `SparseCat` distribution is a categorical distribution that only stores the nonzero probabilities. We can define our transition function as follows:

```@example gridworld_mdp
function POMDPs.transition(mdp::GridWorldMDP, s::GridWorldState, a::Symbol)
Expand Down Expand Up @@ -413,7 +413,7 @@ We are almost done! We still need to define `discount`. Let's first use `POMDPLi
using POMDPLinter
@show_requirements POMDPs.solve(ValueIterationSolver(), mdp)
nothing # hide
```
As we expected, we need to define `discount`.

Expand All @@ -428,7 +428,7 @@ Let's check again:

```@example gridworld_mdp
@show_requirements POMDPs.solve(ValueIterationSolver(), mdp)
nothing # hide
```

## Solving the Grid World MDP (Value Iteration)
Expand Down
6 changes: 3 additions & 3 deletions docs/src/example_simulations.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ include("examples/crying_baby_solvers.jl")
```

## Stepthrough
The stepthrough simulater provides a window into the simulation with a for-loop syntax.
The stepthrough simulator provides a window into the simulation with a for-loop syntax.

Within the body of the for loop, we have access to the belief, the action, the observation, and the reward, in each step. We also calculate the sum of the rewards in this example, but note that this is _not_ the _discounted reward_.

Expand Down Expand Up @@ -58,7 +58,7 @@ history = simulate(hr, tabular_crying_baby_pomdp, policy, DiscreteUpdater(tabula
nothing # hide
```

The history object produced by a `HistoryRecorder` is a `SimHistory`, documented in the POMDPTools simulater section [Histories](@ref). The information in this object can be accessed in several ways. For example, there is a function:
The history object produced by a `HistoryRecorder` is a `SimHistory`, documented in the POMDPTools simulator section [Histories](@ref). The information in this object can be accessed in several ways. For example, there is a function:
```@example crying_sim
discounted_reward(history)
```
Expand Down Expand Up @@ -97,7 +97,7 @@ demo_eachstep(history) # hide
## Parallel Simulations
It is often useful to evaluate a policy by running many simulations. The parallel simulator is the most effective tool for this. To use the parallel simulator, first create a list of `Sim` objects, each of which contains all of the information needed to run a simulation. Then then run the simulations using `run_parallel`, which will return a `DataFrame` with the results.

In this example, we will compare the performance of the polcies we computed in the [Using Different Solvers](@ref) section (i.e. `sarsop_policy`, `pomcp_planner`, and `heuristic_policy`). To evaluate the policies, we will run 100 simulations for each policy. We can do this by adding 100 `Sim` objects of each policy to the list.
In this example, we will compare the performance of the policies we computed in the [Using Different Solvers](@ref) section (i.e. `sarsop_policy`, `pomcp_planner`, and `heuristic_policy`). To evaluate the policies, we will run 100 simulations for each policy. We can do this by adding 100 `Sim` objects of each policy to the list.

```@example crying_sim
using DataFrames
Expand Down
8 changes: 4 additions & 4 deletions docs/src/example_solvers.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,12 @@ end
```

## Offline (SARSOP)
In this example, we will use the [NativeSARSOP](https://github.com/JuliaPOMDP/NativeSARSOP.jl) solver. The process for generating offline polcies is similar for all offline solvers. First, we define the solver with the desired parameters. Then, we call `POMDPs.solve` with the solver and the problem. We can query the policy using the `action` function.
In this example, we will use the [NativeSARSOP](https://github.com/JuliaPOMDP/NativeSARSOP.jl) solver. The process for generating offline polices is similar for all offline solvers. First, we define the solver with the desired parameters. Then, we call `POMDPs.solve` with the solver and the problem. We can query the policy using the `action` function.

```@example crying_sim
using NativeSARSOP
# Define the solver with the desired paramters
# Define the solver with the desired parameters
sarsop_solver = SARSOPSolver(; max_time=10.0)
# Solve the problem by calling POMDPs.solve. SARSOP will compute the policy and return an `AlphaVectorPolicy`
Expand All @@ -57,7 +57,7 @@ a = action(sarsop_policy, b)
```

## Online (POMCP)
For the online solver, we will use Particle Monte Carlo Planning ([POMCP](https://github.com/JuliaPOMDP/BasicPOMCP.jl)). For online solvers, we first define the solver similar to offline solvers. However, when we call `POMDPs.solve`, we are returned an online plannner. Similar to the offline solver, we can query the policy using the `action` function and that is when the online solver will compute the action.
For the online solver, we will use Particle Monte Carlo Planning ([POMCP](https://github.com/JuliaPOMDP/BasicPOMCP.jl)). For online solvers, we first define the solver similar to offline solvers. However, when we call `POMDPs.solve`, we are returned an online planner. Similar to the offline solver, we can query the policy using the `action` function and that is when the online solver will compute the action.

```@example crying_sim
using BasicPOMCP
Expand All @@ -73,7 +73,7 @@ a = action(pomcp_planner, b)
```

## Heuristic Policy
While we often want to use a solver to compute a policy, sometimes we might want to use a heuristic policy. For example, we may want to use a heuristic policy during our rollouts for online solvers or to use as a baseline. In this example, we will define a simple heuristic policy that feeds the baby if our belief of the baby being hungry is greater than 50%, otherwise we will randomly ignore or sing to the baby.
While we often want to use a solver to compute a policy, sometimes we might want to use a heuristic policy. For example, we may want to use a heuristic policy during our rollout for online solvers or to use as a baseline. In this example, we will define a simple heuristic policy that feeds the baby if our belief of the baby being hungry is greater than 50%, otherwise we will randomly ignore or sing to the baby.

```@example crying_sim
struct HeuristicFeedPolicy{P<:POMDP} <: Policy
Expand Down
4 changes: 2 additions & 2 deletions docs/src/gallery.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ println("gif saved to: $(saved_gif.filename)")
```

## [RockSample](https://github.com/JuliaPOMDP/RockSample.jl)
The RockSample problem problem from T. Smith, R. Simmons, "Heuristic Search Value Iteration for POMDPs", Association for Uncertainty in Artificial Intelligence (UAI), 2004.
The RockSample problem from T. Smith, R. Simmons, "Heuristic Search Value Iteration for POMDPs", Association for Uncertainty in Artificial Intelligence (UAI), 2004.

The robot must navigate and sample good rocks (green) and then arrive at an exit area. The robot can only sense the rocks with an imperfect sensor that has performance that depends on the distance to the rock.

Expand Down Expand Up @@ -261,4 +261,4 @@ Pkg.rm("Plots")
```

## Adding New Gallery Examples
To add new examples, please submit a pull request to the POMDPs.jl repository with changes made to the `gallery.md` file in `docs/src/`. Please include the creation of a gif in the code snippet. The gif should be generated during the creation of the documenation using `@eval` and saved in the `docs/src/examples/` directory. The gif should be named `problem_name.gif` where `problem_name` is the name of the problem. The gif can then be included using `![problem_name](examples/problem_name.gif)`.
To add new examples, please submit a pull request to the POMDPs.jl repository with changes made to the `gallery.md` file in `docs/src/`. Please include the creation of a gif in the code snippet. The gif should be generated during the creation of the documentation using `@eval` and saved in the `docs/src/examples/` directory. The gif should be named `problem_name.gif` where `problem_name` is the name of the problem. The gif can then be included using `![problem_name](examples/problem_name.gif)`.

0 comments on commit 4d0b542

Please sign in to comment.