diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 709253b..085efd8 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-18T02:06:17","documenter_version":"1.3.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.4","generation_timestamp":"2024-07-24T21:57:11","documenter_version":"1.3.0"}} \ No newline at end of file diff --git a/dev/api/index.html b/dev/api/index.html index 95f1f57..c8b1541 100644 --- a/dev/api/index.html +++ b/dev/api/index.html @@ -1,12 +1,12 @@ -API Documentation · CompressedBeliefMDPs

API Documentation

Contents

Index

Types/Functors

CompressedBeliefMDPs.CompressedBeliefMDPType
CompressedBeliefMDP{B, A}

The CompressedBeliefMDP struct is a generalization of the compressed belief-state MDP presented in Exponential Family PCA for Belief Compression in POMDPs.

Type Parameters

  • B: The type of compressed belief states.
  • A: The type of actions.

Fields

  • bmdp::GenerativeBeliefMDP: The generative belief-state MDP.
  • compressor::Compressor: The compressor used to compress belief states.
  • ϕ::Bijection: A bijection representing the mapping from uncompressed belief states to compressed belief states. See notes.

Constructors

CompressedBeliefMDP(pomdp::POMDP, updater::Updater, compressor::Compressor)
+API Documentation · CompressedBeliefMDPs

API Documentation

Contents

Index

Types/Functors

CompressedBeliefMDPs.CompressedBeliefMDPType
CompressedBeliefMDP{B, A}

The CompressedBeliefMDP struct is a generalization of the compressed belief-state MDP presented in Exponential Family PCA for Belief Compression in POMDPs.

Type Parameters

  • B: The type of compressed belief states.
  • A: The type of actions.

Fields

  • bmdp::GenerativeBeliefMDP: The generative belief-state MDP.
  • compressor::Compressor: The compressor used to compress belief states.
  • ϕ::Bijection: A bijection representing the mapping from uncompressed belief states to compressed belief states. See notes.

Constructors

CompressedBeliefMDP(pomdp::POMDP, updater::Updater, compressor::Compressor)
 CompressedBeliefMDP(pomdp::POMDP, sampler::Sampler, updater::Updater, compressor::Compressor)

Constructs a CompressedBeliefMDP using the specified POMDP, updater, and compressor.

Warning

The 4-argument constructor is a quality-of-life constructor that calls fit! on the given compressor.

Example Usage

pomdp = TigerPOMDP()
 updater = DiscreteUpdater(pomdp)
 compressor = PCACompressor(1)
-mdp = CompressedBeliefMDP(pomdp, updater, compressor)

For continuous POMDPs, see ParticleFilters.jl.

Notes

  • While compressions aren't usually injective, we cache beliefs and their compressions on a first-come, first-served basis, so we can effectively use a bijection without loss of generality.
source
CompressedBeliefMDPs.CompressedBeliefPolicyType
CompressedBeliefPolicy

Maps a base policy for the compressed belief-state MDP to a policy for the true POMDP.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_policy::Policy: The base policy used for decision-making in the compressed belief-state MDP.

Constructors

CompressedBeliefPolicy(m::CompressedBeliefMDP, base_policy::Policy)

Constructs a CompressedBeliefPolicy using the specified compressed belief-state MDP and base policy.

Example Usage

policy = solve(solver, pomdp)
+mdp = CompressedBeliefMDP(pomdp, updater, compressor)

For continuous POMDPs, see ParticleFilters.jl.

Notes

  • While compressions aren't usually injective, we cache beliefs and their compressions on a first-come, first-served basis, so we can effectively use a bijection without loss of generality.
source
CompressedBeliefMDPs.CompressedBeliefPolicyType
CompressedBeliefPolicy

Maps a base policy for the compressed belief-state MDP to a policy for the true POMDP.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_policy::Policy: The base policy used for decision-making in the compressed belief-state MDP.

Constructors

CompressedBeliefPolicy(m::CompressedBeliefMDP, base_policy::Policy)

Constructs a CompressedBeliefPolicy using the specified compressed belief-state MDP and base policy.

Example Usage

policy = solve(solver, pomdp)
 s = initialstate(pomdp)
 a = action(policy, s) # returns the approximately optimal action for state s
-v = value(policy, s)  # returns the approximately optimal value for state s
source
CompressedBeliefMDPs.CompressedBeliefSolverType
CompressedBeliefSolver

The CompressedBeliefSolver struct represents a solver for compressed belief-state MDPs. It combines a compressed belief-state MDP with a base solver to approximate the value function.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_solver::Solver: The base solver used to solve the compressed belief-state MDP.

Constructors

CompressedBeliefSolver(pomdp::POMDP, base_solver::Solver; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1))
+v = value(policy, s)  # returns the approximately optimal value for state s
source
CompressedBeliefMDPs.CompressedBeliefSolverType
CompressedBeliefSolver

The CompressedBeliefSolver struct represents a solver for compressed belief-state MDPs. It combines a compressed belief-state MDP with a base solver to approximate the value function.

Fields

  • m::CompressedBeliefMDP: The compressed belief-state MDP.
  • base_solver::Solver: The base solver used to solve the compressed belief-state MDP.

Constructors

CompressedBeliefSolver(pomdp::POMDP, base_solver::Solver; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1))
 CompressedBeliefSolver(pomdp::POMDP; updater::Updater=DiscreteUpdater(pomdp), sampler::Sampler=BeliefExpansionSampler(pomdp), compressor::Compressor=PCACompressor(1), interp::Union{Nothing, LocalFunctionApproximator}=nothing, k::Int=1, verbose::Bool=false, max_iterations::Int=1000, n_generative_samples::Int=10, belres::Float64=1e-3)

Constructs a CompressedBeliefSolver using the specified POMDP, base solver, updater, sampler, and compressor. Alternatively, you can omit the base solver in which case a LocalApproximationValueIterationSolver(https://github.com/JuliaPOMDP/LocalApproximationValueIteration.jl) will be created instead. For example, different base solvers are needed if the POMDP state and action space are continuous.

Example Usage

julia> pomdp = TigerPOMDP();
 julia> solver = CompressedBeliefSolver(pomdp; verbose=true, max_iterations=10);
 julia> solve(solver, pomdp);
@@ -19,7 +19,7 @@
 [Iteration 7   ] residual:       6.03 | iteration runtime:      0.495 ms, (     0.639 s total)
 [Iteration 8   ] residual:       5.73 | iteration runtime:      0.585 ms, (     0.639 s total)
 [Iteration 9   ] residual:       4.02 | iteration runtime:      0.463 ms, (      0.64 s total)
-[Iteration 10  ] residual:       7.28 | iteration runtime:      0.576 ms, (      0.64 s total)
source

Functions

CompressedBeliefMDPs.make_cacheFunction
make_cache(B, B̃)

Helper function that creates a cache that maps each unique belief from the set B to its corresponding compressed representation in .

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • B̃::Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the beliefs in B.

Returns

  • Dict{<:Any, Vector{Float64}}: A dictionary mapping each unique belief in B to its corresponding compressed representation in .

Example Usage

B = [belief1, belief2, belief3]
+[Iteration 10  ] residual:       7.28 | iteration runtime:      0.576 ms, (      0.64 s total)
source

Functions

CompressedBeliefMDPs.make_cacheFunction
make_cache(B, B̃)

Helper function that creates a cache that maps each unique belief from the set B to its corresponding compressed representation in .

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • B̃::Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the beliefs in B.

Returns

  • Dict{<:Any, Vector{Float64}}: A dictionary mapping each unique belief in B to its corresponding compressed representation in .

Example Usage

B = [belief1, belief2, belief3]
 B̃ = [compressed_belief1; compressed_belief2; compressed_belief3]
-ϕ = make_cache(B, B̃)
source
CompressedBeliefMDPs.make_numericalFunction
make_numerical(B, pomdp)

Helper function that converts a set of beliefs B into a numerical matrix representation suitable for processing by numerical algorithms/compressors.

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • pomdp::POMDP: The POMDP model associated with the beliefs.

Returns

  • Matrix{Float64}: A matrix where each row corresponds to a numerical representation of a belief in B.

Example Usage

B = [belief1, belief2, belief3]
-B_numerical = make_numerical(B, pomdp)
source
CompressedBeliefMDPs.compress_POMDPFunction
compress_POMDP(pomdp, sampler, updater, compressor)

Creates a compressed belief-state MDP by sampling, compressing, and caching beliefs from the given POMDP.

Arguments

  • pomdp::POMDP: The POMDP model to be compressed.
  • sampler::Sampler: A sampler to generate a set of beliefs from the POMDP.
  • updater::Updater: An updater to initialize beliefs from states.
  • compressor::Compressor: A compressor to reduce the dimensionality of the beliefs.

Returns

  • CompressedBeliefMDP: The constructed compressed belief-state MDP.
  • Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the sampled beliefs.

Example Usage

```julia pomdp = TigerPOMDP() sampler = BeliefExpansionSampler(pomdp) updater = DiscreteUpdater(pomdp) compressor = PCACompressor(2) m, B̃ = compress_POMDP(pomdp, sampler, updater, compressor)

source
+ϕ = make_cache(B, B̃)
source
CompressedBeliefMDPs.make_numericalFunction
make_numerical(B, pomdp)

Helper function that converts a set of beliefs B into a numerical matrix representation suitable for processing by numerical algorithms/compressors.

Arguments

  • B::Vector{<:Any}: A vector of beliefs.
  • pomdp::POMDP: The POMDP model associated with the beliefs.

Returns

  • Matrix{Float64}: A matrix where each row corresponds to a numerical representation of a belief in B.

Example Usage

B = [belief1, belief2, belief3]
+B_numerical = make_numerical(B, pomdp)
source
CompressedBeliefMDPs.compress_POMDPFunction
compress_POMDP(pomdp, sampler, updater, compressor)

Creates a compressed belief-state MDP by sampling, compressing, and caching beliefs from the given POMDP.

Arguments

  • pomdp::POMDP: The POMDP model to be compressed.
  • sampler::Sampler: A sampler to generate a set of beliefs from the POMDP.
  • updater::Updater: An updater to initialize beliefs from states.
  • compressor::Compressor: A compressor to reduce the dimensionality of the beliefs.

Returns

  • CompressedBeliefMDP: The constructed compressed belief-state MDP.
  • Matrix{Float64}: A matrix where each row corresponds to the compressed representation of the sampled beliefs.

Example Usage

```julia pomdp = TigerPOMDP() sampler = BeliefExpansionSampler(pomdp) updater = DiscreteUpdater(pomdp) compressor = PCACompressor(2) m, B̃ = compress_POMDP(pomdp, sampler, updater, compressor)

source
diff --git a/dev/circular/index.html b/dev/circular/index.html index f415d93..83243ee 100644 --- a/dev/circular/index.html +++ b/dev/circular/index.html @@ -5,4 +5,4 @@ n_corridors = 8 corridor_length = 25 -maze = CircularMaze(n_corridors, corridor_length)source
CompressedBeliefMDPs.CircularMazeStateType
CircularMazeState(corridor::Integer, x::Integer)

The CircularMazeState struct represents the state of an agent in a circular maze.

Fields

  • corridor::Integer: The corridor number. The value ranges from 1 to n_corridors.
  • x::Integer: The position of the state within the corridor. The value ranges from 1 to the corridor_length.
source
+maze = CircularMaze(n_corridors, corridor_length)source
CompressedBeliefMDPs.CircularMazeStateType
CircularMazeState(corridor::Integer, x::Integer)

The CircularMazeState struct represents the state of an agent in a circular maze.

Fields

  • corridor::Integer: The corridor number. The value ranges from 1 to n_corridors.
  • x::Integer: The position of the state within the corridor. The value ranges from 1 to the corridor_length.
source
diff --git a/dev/compressors/index.html b/dev/compressors/index.html index 8e08a7b..1aec1b7 100644 --- a/dev/compressors/index.html +++ b/dev/compressors/index.html @@ -12,4 +12,4 @@ function fit!(c::MyCompressor, beliefs) # YOUR CODE HERE -end

Implementation Tips

Implemented Compressors

CompressedBeliefMDPs currently provides wrappers for the following compression types:

Principal Component Analysis (PCA)

CompressedBeliefMDPs.PCACompressorFunction

Wrapper for MultivariateStats.PCA.

source

Kernel PCA

CompressedBeliefMDPs.KernelPCACompressorFunction

Wrapper for MultivariateStats.KernelPCA.

source

Probabilistic PCA

CompressedBeliefMDPs.PPCACompressorFunction

Wrapper for MultivariateStats.PPCA.

source

Factor Analysis

CompressedBeliefMDPs.FactorAnalysisCompressorFunction

Wrapper for MultivariateStats.FactorAnalysis

source

Isomap

CompressedBeliefMDPs.IsomapCompressorFunction

Wrapper for ManifoldLearning.Isomap.

source

Autoencoder

CompressedBeliefMDPs.AutoencoderCompressorType

Implements an autoencoder in Flux.

source

Variational Auto-Encoder (VAE)

CompressedBeliefMDPs.VAECompressorType

Implements a VAE in Flux.

source
Warning

Some compression algorithms aren't optimized for large belief spaces. While they pass our unit tests, they may fail on large POMDPs or without seeding. For large POMDPs, users may want a custom Compressor.

+end

Implementation Tips

Implemented Compressors

CompressedBeliefMDPs currently provides wrappers for the following compression types:

Principal Component Analysis (PCA)

CompressedBeliefMDPs.PCACompressorFunction

Wrapper for MultivariateStats.PCA.

source

Kernel PCA

CompressedBeliefMDPs.KernelPCACompressorFunction

Wrapper for MultivariateStats.KernelPCA.

source

Probabilistic PCA

CompressedBeliefMDPs.PPCACompressorFunction

Wrapper for MultivariateStats.PPCA.

source

Factor Analysis

CompressedBeliefMDPs.FactorAnalysisCompressorFunction

Wrapper for MultivariateStats.FactorAnalysis

source

Isomap

CompressedBeliefMDPs.IsomapCompressorFunction

Wrapper for ManifoldLearning.Isomap.

source

Autoencoder

CompressedBeliefMDPs.AutoencoderCompressorType

Implements an autoencoder in Flux.

source

Variational Auto-Encoder (VAE)

CompressedBeliefMDPs.VAECompressorType

Implements a VAE in Flux.

source
Warning

Some compression algorithms aren't optimized for large belief spaces. While they pass our unit tests, they may fail on large POMDPs or without seeding. For large POMDPs, users may want a custom Compressor.

diff --git a/dev/index.html b/dev/index.html index d7c485d..14722e6 100644 --- a/dev/index.html +++ b/dev/index.html @@ -65,4 +65,4 @@ ) policy = solve(solver, pomdp) rs = RolloutSimulator(max_steps=50) -r = simulate(rs, pomdp, policy)

Concepts and Architecture

CompressedBeliefMDPs.jl aims to implement a generalization of the belief compression algorithm for solving large POMDPs. The algorithm has four steps:

  1. collect belief samples,
  2. compress the samples,
  3. create the compressed belief-state MDP,
  4. solve the MDP.

Each step is handled by Sampler, Compressor, CompressedBeliefMDP, and CompressedBeliefSolver respectively.

For more details, please see the rest of the documentation or the associated paper.

+r = simulate(rs, pomdp, policy)

Concepts and Architecture

CompressedBeliefMDPs.jl aims to implement a generalization of the belief compression algorithm for solving large POMDPs. The algorithm has four steps:

  1. collect belief samples,
  2. compress the samples,
  3. create the compressed belief-state MDP,
  4. solve the MDP.

Each step is handled by Sampler, Compressor, CompressedBeliefMDP, and CompressedBeliefSolver respectively.

For more details, please see the rest of the documentation or the associated paper.

diff --git a/dev/objects.inv b/dev/objects.inv index f5ef04c..c049146 100644 Binary files a/dev/objects.inv and b/dev/objects.inv differ diff --git a/dev/samplers/index.html b/dev/samplers/index.html index a377c0a..0a2053e 100644 --- a/dev/samplers/index.html +++ b/dev/samplers/index.html @@ -16,13 +16,13 @@ DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85]) DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5]) DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002]) - DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])source

Policy Sampler

CompressedBeliefMDPs.PolicySamplerType
PolicySampler

Samples belief states by rolling out a Policy.

Fields

  • policy::Policy: The policy used for decision making.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), 
+  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])
source

Policy Sampler

CompressedBeliefMDPs.PolicySamplerType
PolicySampler

Samples belief states by rolling out a Policy.

Fields

  • policy::Policy: The policy used for decision making.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

PolicySampler(pomdp::POMDP; policy::Policy=RandomPolicy(pomdp), 
 updater::Updater=DiscreteUpdater(pomdp), n::Integer=10, 
 rng::AbstractRNG=Random.GLOBAL_RNG)

Methods

(s::PolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example

julia> pomdp = TigerPOMDP();
 julia> sampler = PolicySampler(pomdp; n=3); 
 julia> 2-element Vector{Any}:
 DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
-DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
source

ExplorationPolicy Sampler

CompressedBeliefMDPs.ExplorationPolicySamplerType
ExplorationPolicySampler

Samples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.

Fields

  • explorer::ExplorationPolicy: The ExplorationPolicy used for decision making.
  • on_policy::Policy: The fallback Policy used for decision making when not exploring.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
+DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.15000000000000002, 0.85])
source

ExplorationPolicy Sampler

CompressedBeliefMDPs.ExplorationPolicySamplerType
ExplorationPolicySampler

Samples belief states by rolling out an ExplorationPolicy. Essentially identical to PolicySampler.

Fields

  • explorer::ExplorationPolicy: The ExplorationPolicy used for decision making.
  • on_policy::Policy: The fallback Policy used for decision making when not exploring.
  • updater::Updater: The updater used for updating beliefs.
  • n::Integer: The maximum number of simulated steps.
  • rng::AbstractRNG: The random number generator used for sampling.
  • verbose::Bool: Whether to use a progress bar while sampling.

Constructors

ExplorationPolicySampler(pomdp::POMDP; rng::AbstractRNG=Random.GLOBAL_RNG,
 explorer::ExplorationPolicy=EpsGreedyPolicy(pomdp, 0.1; rng=rng), on_policy=RandomPolicy(pomdp),
 updater::Updater=DiscreteUpdater(pomdp), n::Integer=10)

Methods

(s::ExplorationPolicySampler)(pomdp::POMDP)

Returns a vector of unique belief states.

Example Usage

julia> pomdp = TigerPOMDP()
 julia> sampler = ExplorationPolicySampler(pomdp; n=30)
@@ -30,4 +30,4 @@
 3-element Vector{Any}:
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.5, 0.5])
  DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.85, 0.15000000000000002])
- DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])
source
+ DiscreteBelief{TigerPOMDP, Bool}(TigerPOMDP(-1.0, -100.0, 10.0, 0.85, 0.95), Bool[0, 1], [0.9697986577181208, 0.030201342281879207])source