Skip to content

Commit

Permalink
Merge pull request #73 from JuliaPOMDP/parametric
Browse files Browse the repository at this point in the history
Parametric
  • Loading branch information
zsunberg committed Apr 15, 2016
2 parents ea9614d + 548f2ed commit 8d3f3c6
Show file tree
Hide file tree
Showing 22 changed files with 703 additions and 640 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
docs/build/
docs/site/
5 changes: 5 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ julia:
- release
notifications:
email: false
before_script:
- export PATH=$HOME/.local/bin:$PATH
script:
- if [[ -a .git/shallow ]]; then git fetch --unshallow; fi
- julia --check-bounds=yes -e 'Pkg.clone(pwd()); Pkg.test("POMDPs")'
after_success:
- julia -e 'Pkg.clone("https://github.com/MichaelHatherly/Documenter.jl")'
- julia -e 'cd(Pkg.dir("POMDPs")); include(joinpath("docs", "make.jl"))'
144 changes: 5 additions & 139 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,15 @@

This package provides a basic interface for working with partially observable Markov decision processes (POMDPs).

NEWS: We recently made a significant change to the interface, introducing parametric types (see issue #56). If you wish to continue using the old interface, the v0.1 release may be used, but we recommend that all projects update to the new version.

The goal is to provide a common programming vocabulary for researchers and students to use primarily for three tasks:

1. Expressing problems using the POMDP format.
2. Writing solver software.
3. Running simulations efficiently.

For problems and solvers that only use a generative model (rather than explicit transition and observation distributions), see also [GenerativeModels.jl](https://github.com/JuliaPOMDP/GenerativeModels.jl).

## Installation
```julia
Expand All @@ -35,7 +38,6 @@ using POMDPs
POMDPs.add("SARSOP")
```


## Tutorials

The following tutorials aim to get you up to speed with POMDPs.jl:
Expand All @@ -45,142 +47,6 @@ The following tutorials aim to get you up to speed with POMDPs.jl:
of using SARSOP and QMDP to solve the tiger problem


## Core Interface

The core interface provides tools to express problems, program solvers, and setup simulations.

**TODO** this list is not complete! there are some functions in src missing documentation that were not included here


### Distributions

`AbstractDistribution` - Base type for a probability distribution

- `rand(rng::AbstractRNG, d::AbstractDistribution, sample::Any)` fill with random sample from distribution and return the sample
- `pdf(d::AbstractDistribution, x)` value of probability distribution function at x

**XXX** There are functions missing from this list that are included in `src/distribution.jl`

### Problem Model

`POMDP` - Base type for a problem definition<br>
`AbstractSpace` - Base type for state, action, and observation spaces<br>
`State` - Base type for states<br>
`Action` - Base type for actions<br>
`Observation` - Base type for observations

- `states(pomdp::POMDP)` returns the complete state space
- `states(pomdp::POMDP, state::State, sts::AbstractSpace=states(pomdp))` modifies `sts` to the state space accessible from the given state and returns it
- `actions(pomdp::POMDP)` returns the complete action space
- `actions(pomdp::POMDP, state::State, aspace::AbstractSpace=actions(pomdp))` modifies `aspace` to the action space accessible from the given state and returns it
- `actions(pomdp::POMDP, belief::Belief, aspace::AbstractSpace=actions(pomdp))` modifies `aspace` to the action space accessible from the states with nonzero belief and returns it
- `observations(pomdp::POMDP)` returns the complete observation space
- `observations(pomdp::POMDP, state::State, ospace::AbstractSpace)` modifies `ospace` to the observation space accessible from the given state and returns it
- `reward(pomdp::POMDP, state::State, action::Action, statep::State)` returns the immediate reward for the s-a-s' triple
- `transition(pomdp::POMDP, state::State, action::Action, distribution=create_transition_distribution(pomdp))` modifies `distribution` to the transition distribution from the current state-action pair and returns it
- `observation(pomdp::POMDP, state::State, action::Action, statep::State, distribution=create_observation_distribution(pomdp))` modifies `distribution` to the observation distribution for the s-a-s' tuple (state, action, and next state) and returns it
- `observation(pomdp::POMDP, state::State, action::Action, distribution=create_observation_distribution(pomdp))` modifies `distribution` to the observation distribution for the s-a pair (state and action) and returns it
- `discount(pomdp::POMDP)` returns the discount factor
- `isterminal(pomdp::POMDP, state::State)` checks if a state is terminal
- `isterminal(pomdp::POMDP, observation::Observation)` checks if an observation is terminal. A terminal observation should be generated only upon transition to a terminal state.

**XXX** Missing functions such as `n_states`, `n_actions` (see `src/pomdp.jl`)

### Solvers and Polices

`Solver` - Base type a solver<br>
`Policy` - Base type for a policy (a map from every possible belief, or more abstract policy state, to an optimal or suboptimal action)

- `solve(solver::Solver, pomdp::POMDP, policy::Policy=create_policy(solver, pomdp))` solves the POMDP and modifies `policy` to be the solution of `pomdp` and returns it
- `action(policy::Policy, belief::Belief)` or `action(policy::Policy, belief::Belief, act::Action)` returns an action for the current belief given the policy (the method with three arguments modifies `act` and returns it)
- `action(policy::Policy, state::State)` or `action(policy::Policy, state::State, act::Action)` returns an action for the current state given the policy

### Belief

`Belief` - Base type for an object representing some knowledge about the state (often a probability distribution)<br>
`BeliefUpdater` - Base type for an object that defines how a belief should be updated

- `update(updater::BeliefUpdater, belief_old::Belief, action::Action, obs::Observation, belief_new::Belief=create_belief(updater))` modifies `belief_new` to the belief given the old belief (`belief_old`) and the latest action and observation and returns the updated belief.

### Simulation

`Simulator` - Base type for an object defining how a simulation should be carried out

- `simulate(simulator::Simulator, pomdp::POMDP, policy::Policy, updater::BeliefUpdater, initial_belief::Belief)` runs a simulation using the specified policy and returns the accumulated reward

## Minor Components
## Documentation

### Convenience Functions

Several convenience functions are also provided in the interface to provide standard vocabulary for common tasks and may be used by some solvers or in simulation, but they are not strictly necessary for expressing problems.

- `index(pomdp::POMDP, state::State)` returns the index of the given state for a discrete POMDP
- `initial_belief(pomdp::POMDP)` returns an example initial belief for the pomdp
- `iterator(space::AbstractSpace)` returns an iterator over a space or an iterable object containing the space (such as an array)
- `dimensions(s::AbstractSpace)` returns the number (integer) of dimensions in a space
- `lowerbound(s::AbstractSpace, i::Int)` returns the lower bound of dimension `i`
- `upperbound(s::AbstractSpace, i::Int)` returns the upper bound of dimension `i`
- `rand(rng::AbstractRNG, d::AbstractSpace, sample::Any)` fill with random sample from space and return the sample
- `value(policy::Policy, belief::Belief)` returns the utility value from policy p given the belief
- `value(policy::Policy, state::State)` returns the utility value from policy p given the state
- `convert_belief(updater::BeliefUpdater, b::Belief)` returns a belief that can be updated using `updater` that has a similar distribution to `b` (this conversion may be lossy)
- `updater(p::Policy)` returns a default BeliefUpdater appropriate for a belief type that policy `p` can use

### Object Creators

In many cases, it is more efficient to fill pre-allocated objects with new data rather than create new objects at each iteration of an algorithm or simulation. When a new object is needed, the following functions may be called. They should return an object of the appropriate type as efficiently as possible. The data in the object does not matter - it will be overwritten when the object is used.

- `create_state(pomdp::POMDP)` creates a single state object (for preallocation purposes)
- `create_observation(pomdp::POMDP)` creates a single observation object (for preallocation purposes)
- `create_transition_distribution(pomdp::POMDP)` returns a transition distribution
- `create_observation_distribution(pomdp::POMDP)` returns an observation distribution
- `create_policy(solver::Solver, pomdp::POMDP)` creates a policy object (for preallocation purposes)
- `create_action(pomdp::POMDP)` creates an action object (for preallocation purposes)
- `create_belief(updater::BeliefUpdater)` creates a belief object of the type used by `updater` (for preallocation purposes)
- `create_belief(pomdp::POMDP)` creates an empty problem-native belief object (for preallocation purposes)


## Reference Simulation Implementation

This reference simulation implementation shows how the various functions will be used. Please note that this example is written for clarity and not efficiency (see [TODO: link to main doc] for efficiency tips).

```julia
type ReferenceSimulator
rng::AbstractRNG
max_steps
end

function simulate(simulator::ReferenceSimulator, pomdp::POMDP, policy::Policy, updater::BeliefUpdater, initial_belief::Belief)

s = create_state(pomdp)
o = create_observation(pomdp)
rand(sim.rng, initial_belief, s)

b = convert_belief(updater, initial_belief)

step = 1
disc = 1.0
r = 0.0

while step <= sim.max_steps && !isterminal(pomdp, s)
a = action(policy, b)

sp = create_state(pomdp)
trans_dist = transition(pomdp, s, a)
rand(sim.rng, trans_dist, sp)

r += disc*reward(pomdp, s, a, sp)

obs_dist = observation(pomdp, s, a, sp)
rand(sim.rng, obs_dist, o)

b = update(updater, b, a, o)

s = sp
disc *= discount(pomdp)
step += 1
end

end

```
Detailed documentation can be found [here](http://juliapomdp.github.io/POMDPs.jl/latest/).
12 changes: 12 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
using Documenter, POMDPs

makedocs(
# options
modules = [POMDPs]
)

deploydocs(
repo = "github.com/JuliaPOMDP/POMDPs.jl.git",
julia = "release",
osname = "linux"
)
32 changes: 32 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
site_name: POMDPs.jl
repo_url: https://github.com/JuliaPOMDP/POMDPs.jl
site_description: API for solving partially observable Markov decision processes in Julia.
site_author: Maxim Egorov

theme: readthedocs

extra:
palette:
primary: 'indigo'
accent: 'blue'

extra_css:
- assets/Documenter.css

markdown_extensions:
- codehilite
- extra
- tables
- fenced_code

extra_javascript:
- https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_HTML
- assets/mathjaxhelper.js

docs_dir: 'build'

pages:
- Home: index.md
- Manual: guide.md
- API: api.md

112 changes: 112 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Solver Documentation

Documentation for the `POMDPs.jl` user interface. You can get help for any type or
function in the module by typing `?` in the Julia REPL followed by the name of
type or function. For example:

```julia
julia>using POMDPs
julia>?
help?>reward
search: reward

reward{S,A,O}(pomdp::POMDP{S,A,O}, state::S, action::A, statep::S)

Returns the immediate reward for the s-a-s triple

reward{S,A,O}(pomdp::POMDP{S,A,O}, state::S, action::A)

Returns the immediate reward for the s-a pair

```

{meta}
CurrentModule = POMDPs

## Contents

{contents}
Pages = ["api.md"]

## Index

{index}
Pages = ["api.md"]


## Types

{docs}
POMDP
MDP
AbstractSpace
AbstractDistribution
Solver
Policy
Belief
BeliefUpdater

## Model Functions

{docs}
states
actions
observations
reward
transition
observation
isterminal
isterminal_obs
n_states
n_actions
n_observations
state_index
action_index
obs_index
discount

## Distribution/Space Functions

{docs}
rand
pdf
dimensions
iterator
create_transition_distribution
create_observation_distribution

## Belief Functions

{docs}
initial_belief
create_belief
update
convert_belief

## Policy and Solver Functions

{docs}
create_policy
solve
updater
action
value

## Simulator

{docs}
Simulator
simulate

## Utility Tools

{docs}
add
@pomdp_func
strip_arg

## Constants

{docs}
REMOTE_URL
SUPPORTED_SOLVERS
Loading

0 comments on commit 8d3f3c6

Please sign in to comment.