Refactor the prior implementation (#286)

* Rename `AbstractODEFilterPrior` to `AbstractGaussMarkovPrior` * Use accessors instead of the fields directly * Small intermediate clean-up * Rename AbstractGaussMarkovPrior to AbstractGaussMarkovProcess * Make the initial distribution part of the prior * Specialize the initial distribution for Matern * Implement a `remake` to change elType and dimension of priors * Add MatrixEquations.jl * Add Compat entry for MatrixEquations.jl * Fix a typo * Fix vale.sh issues * Fix another bug * Fix an IOUP bug I introduced * JuliaFormatter.jl * JuliaFormatter.jl update apparently changed a bit * Add the priors to the basic ODE correctness tests * Test the Matern initial distribution in the priors tests * Make IOUP and Matern use the default LTISDE discretize * JuliaFormatter.jl * Add a plotting convenience function * Better plotting of priors with dimension > 1 * Improve the plotting some more * Add the new prior functionality to the tests * JuliaFormatter.jl * Make the plotting much nicer * Add many docstrings * Start working more on the prior documentation * More sensible Documenter.jl settings * Make the priors one-dimensional by default * Give the probabilistic exponential integrator tutorial an id * Make the prior plot layout less opinionated * Fix some prior bugs I introduced now that they're 1-D * Refactor priors some more * Make the plot recipe yet again less opinionated * Towards a better prior documentation * Fix a bug * Better docs * Better docs * Fix some tests * Better docs * Better plots * JuliaFormatter.jl * Somewhat better docs * Change the field names and actually use accessors * Remove things from the plot recipe * Better errors * Fix the Matern constructor * Fix the IOUP constructor * Fix more bugs * Test some prior-related error messages * JuliaFormatter.jl * Rely more on keywords for the priors * Make tip into info * Fix another failing test * JuliaFormatter.jl * JuliaFormatter.jl * Improve some more docstrings * Add a newline in the docstring math
nathanaelbosch · Jan 17, 2024 · fd739dc · fd739dc
1 parent b5e36cf
commit fd739dc
Show file tree

Hide file tree

Showing 22 changed files with 566 additions and 182 deletions.
diff --git a/...ale-styles/Vocab/ProbNumDiffEq/accept.txt → ...fig/vocabularies/ProbNumDiffEq/accept.txt b/...ale-styles/Vocab/ProbNumDiffEq/accept.txt → ...fig/vocabularies/ProbNumDiffEq/accept.txt
diff --git a/...ale-styles/Vocab/ProbNumDiffEq/reject.txt → ...fig/vocabularies/ProbNumDiffEq/reject.txt b/...ale-styles/Vocab/ProbNumDiffEq/reject.txt → ...fig/vocabularies/ProbNumDiffEq/reject.txt
diff --git a/Project.toml b/Project.toml
@@ -17,6 +17,7 @@ FunctionWrappersWrappers = "77dc65aa-8811-40c2-897b-53d922fa7daf"
 GaussianDistributions = "43dcc890-d446-5863-8d1a-14597580bb8d"
 Kronecker = "2c470bb0-bcc8-11e8-3dad-c9649493f05e"
 LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
+MatrixEquations = "99c1a7ee-ab34-5fd5-8076-27c950a045f4"
 Octavian = "6fd5a793-0b7e-452c-907f-f8bfe9c57db4"
 OrdinaryDiffEq = "1dea7af3-3e70-54e6-95c3-0bf5283fa5ed"
 PSDMatrices = "fe68d972-6fd8-4755-bdf0-97d4c54cefdc"
@@ -56,6 +57,7 @@ FunctionWrappersWrappers = "0.1.3"
 GaussianDistributions = "0.5"
 Kronecker = "0.5.4"
 LinearAlgebra = "1"
+MatrixEquations = "2"
 Octavian = "0.3.17"
 OrdinaryDiffEq = "6.52"
 PSDMatrices = "0.4.7"

diff --git a/docs/make.jl b/docs/make.jl
@@ -69,7 +69,8 @@ makedocs(
         ],
         "References" => "references.md",
     ],
-    warnonly=:missing_docs,
+    warnonly=Documenter.except(:missing_docs),
+    checkdocs=:none,
 )
 
 # Documenter can also automatically deploy documentation to gh-pages.

diff --git a/docs/src/priors.md b/docs/src/priors.md
@@ -2,7 +2,7 @@
 
 **TL;DR: If you're unsure which prior to use, just stick to the default integrated Wiener process prior [`IWP`](@ref)!**
 
-# Background
+## Background
 
 We model the ODE solution ``y(t)`` with a Gauss--Markov prior.
 More precisely, let
@@ -24,14 +24,103 @@ Then ``Y^{(i)}(t)`` models the ``i``-th derivative of ``y(t)``.
 If you're more interested in the _diffusion_ ``\textcolor{#4063D8}{\Gamma}`` check out the [Diffusion models and calibration](@ref) section,
 and for info on the initial distribution ``\textcolor{purple}{ \mathcal{N} \left( \mu_0, \Sigma_0 \right) }`` check out the [Initialization](@ref) section.
 
-**If you're unsure which prior to use, just stick to the integrated Wiener process prior [`IWP`](@ref)!**
-This is also the default choice for all solvers.
-The other priors are rather experimental / niche at the time of writing.
+!!! info
+    **If you're unsure which prior to use, just stick to the integrated Wiener process prior [`IWP`](@ref)!**
+    This is also the default choice for all solvers.
+    The other priors are rather experimental / niche at the time of writing.
 
-## API
+## Prior choices
+
+ProbNumDiffEq.jl currently supports three classes of priors for the solvers:
+- [`IWP`](@ref): ``q``-times integrated Wiener processes
+- [`IOUP`](@ref): ``q``-times integrated Ornstein--Uhlenbeck processes
+- [`Matern`](@ref): Matérn processes
+Let's look at each of them in turn and visualize some examples.
 
+
+### Integrated Wiener process ([`IWP`](@ref))
 ```@docs
 IWP
+```
+Here is how the [`IWP`](@ref) looks for varying smoothness parameters ``q``:
+```@example priors
+using ProbNumDiffEq, Plots
+plotrange = range(0, 10, length=250)
+plot(
+    plot(IWP(1), plotrange; title="q=1"),
+    plot(IWP(2), plotrange; title="q=2"),
+    plot(IWP(3), plotrange; title="q=3"),
+    plot(IWP(4), plotrange; title="q=4");
+    ylims=(-20,20),
+)
+```
+In the context of ODE solvers, the smoothness parameter ``q`` influences the convergence rate of the solver,
+and so it is typically chose similarly to the order of a Runge--Kutta method: lower order for low accuracy, higher order for high accuracy.
+
+
+### Integrated Ornstein--Uhlenbeck process ([`IOUP`](@ref))
+The ``q``-times integrated Ornstein--Uhlenbeck process prior [`IOUP`](@ref) is a generalization of the IWP prior, where the drift matrix ``\textcolor{#389826}{A}`` is not zero:
+```@docs
 IOUP
+```
+
+Here is how the [`IOUP`](@ref) looks for varying rate parameters:
+```@example priors
+using ProbNumDiffEq, Plots
+plotrange = range(0, 10, length=250)
+plot(
+    plot(IOUP(1, -1), plotrange; title="q=1,L=-1", ylims=(-20,20)),
+    plot(IOUP(1, 1), plotrange; title="q=1,L=1", ylims=(-20,20)),
+    plot(IOUP(4, -1), plotrange; title="q=4,L=-1", ylims=(-50,50)),
+    plot(IOUP(4, 1), plotrange; title="q=4,L=1", ylims=(-50,50));
+)
+```
+
+In the context of [Probabilistic Exponential Integrators](@ref probexpinttutorial), the rate parameter is often chosen according to the given ODE.
+Here is an example for a damped oscillator:
+```@example priors
+plot(IOUP(2, 1, [-0.2 -2π; 2π -0.2]), plotrange; plot_title="damped oscillator prior");
+```
+
+### Matérn process ([`Matern`](@ref))
+```@docs
 Matern
 ```
+
+Here is how the [`Matern`](@ref) looks for varying smoothness parameters ``q``:
+```@example priors
+using ProbNumDiffEq, Plots
+plotrange = range(0, 10, length=250)
+plot(
+    plot(Matern(1, 1), plotrange; title="q=1"),
+    plot(Matern(2, 1), plotrange; title="q=2"),
+    plot(Matern(3, 1), plotrange; title="q=3"),
+    plot(Matern(4, 1), plotrange; title="q=4");
+)
+```
+and for varying length scales ``\ell``:
+```@example priors
+plot(
+    plot(Matern(2, 5), plotrange; title="l=5"),
+    plot(Matern(2, 2), plotrange; title="l=2"),
+    plot(Matern(2, 1), plotrange; title="l=1"),
+    plot(Matern(2, 0.5), plotrange; title="l=0.5"),
+)
+```
+
+## API
+```@docs
+ProbNumDiffEq.AbstractGaussMarkovProcess
+ProbNumDiffEq.LTISDE
+ProbNumDiffEq.dim
+ProbNumDiffEq.num_derivatives
+ProbNumDiffEq.to_sde
+ProbNumDiffEq.discretize
+ProbNumDiffEq.initial_distribution
+```
+
+### Convenience functions to analyze and visualize priors
+```@docs
+ProbNumDiffEq.marginalize
+ProbNumDiffEq.sample
+```
diff --git a/docs/src/refs.bib b/docs/src/refs.bib
@@ -170,3 +170,14 @@ @book{hennig22probnum
   author =       {Hennig, Philipp and Osborne, Michael A. and Kersting, Hans P.},
   year =         2022,
 }
+
+@book{sarkka19appliedsde,
+  place =        {Cambridge},
+  series =       {Institute of Mathematical Statistics Textbooks},
+  title =        {Applied Stochastic Differential Equations},
+  DOI =          {10.1017/9781108186735},
+  publisher =    {Cambridge University Press},
+  author =       {Särkkä, Simo and Solin, Arno},
+  year =         2019,
+  collection =   {Institute of Mathematical Statistics Textbooks}
+}
diff --git a/docs/src/tutorials/exponential_integrators.md b/docs/src/tutorials/exponential_integrators.md
@@ -1,4 +1,4 @@
-# Probabilistic Exponential Integrators
+# [Probabilistic Exponential Integrators](@id probexpinttutorial)
 
 Exponential integrators are a class of numerical methods for solving semi-linear ordinary differential equations (ODEs) of the form
 ```math

diff --git a/ext/RecipesBaseExt.jl b/ext/RecipesBaseExt.jl
@@ -72,4 +72,67 @@ import SciMLBase: interpret_vars, getsyms
     end
 end
 
+@recipe function f(
+    process::ProbNumDiffEq.AbstractGaussMarkovProcess,
+    plotrange;
+    N_samples=10,
+    plot_derivatives=false,
+)
+    marginals = ProbNumDiffEq.marginalize(process, plotrange)
+    d = ProbNumDiffEq.dim(process)
+    q = ProbNumDiffEq.num_derivatives(process)
+    means = [mean(m) for m in marginals] |> stack |> permutedims
+    stddevs = [std(m) for m in marginals] |> stack |> permutedims
+
+    perm = permutedims(reshape(collect(1:d*(q+1)), q + 1, d))[:]
+    reorder(X) = X[:, perm]
+
+    E0 = ProbNumDiffEq.projection(d, q)(0)
+    if !plot_derivatives
+        stddevs = stddevs * E0'
+        means = means * E0'
+        q = 0
+    end
+
+    dui(i) = "u" * "'"^i
+    if plot_derivatives
+        title --> [i == 1 ? "$(dui(j))" : "" for i in 1:d for j in 0:q] |> permutedims
+    end
+    ylabel --> [q == 0 ? "u$i" : "" for i in 1:d for q in 0:q] |> permutedims
+    xlabel --> if plot_derivatives
+        [i == d ? "t" : "" for i in 1:d for q in 0:q] |> permutedims
+    else
+        "t"
+    end
+
+    @series begin
+        ribbon --> 3stddevs
+        label --> ""
+        fillalpha --> 0.1
+        layout --> if plot_derivatives
+            (d, q + 1)
+        else
+            d
+        end
+        plotrange, means
+    end
+
+    if N_samples > 0
+        samples = ProbNumDiffEq.sample(process, plotrange, N_samples) |> stack
+        samples = permutedims(samples, (3, 1, 2))
+        for i in 1:N_samples
+            @series begin
+                s = samples[:, :, i]
+                if !plot_derivatives
+                    s = s * E0'
+                end
+                primary --> false
+                linealpha --> 0.3
+                label := ""
+                plotrange, s
+            end
+        end
+    end
+end
+
 end
diff --git a/src/ProbNumDiffEq.jl b/src/ProbNumDiffEq.jl
@@ -6,12 +6,12 @@ import Base: copy, copy!, show, size, ndims, similar, isapprox, isequal, iterate
 
 using LinearAlgebra
 import LinearAlgebra: mul!
-import Statistics: mean, var, std
+import Statistics: mean, var, std, cov
 
 using Reexport
 @reexport using DiffEqBase
 import SciMLBase
-import SciMLBase: interpret_vars, getsyms
+import SciMLBase: interpret_vars, getsyms, remake
 using OrdinaryDiffEq
 using SpecialMatrices, ToeplitzMatrices
 using FastBroadcast
@@ -29,6 +29,7 @@ import Kronecker
 using ArrayAllocators
 using FiniteHorizonGramians
 using FillArrays
+using MatrixEquations
 
 @reexport using GaussianDistributions
 

diff --git a/src/alg_utils.jl b/src/alg_utils.jl
@@ -21,7 +21,7 @@ OrdinaryDiffEq.isimplicit(::EK1) = true
 ############################################
 # Step size control
 OrdinaryDiffEq.isadaptive(::AbstractEK) = true
-OrdinaryDiffEq.alg_order(alg::AbstractEK) = alg.prior.num_derivatives
+OrdinaryDiffEq.alg_order(alg::AbstractEK) = num_derivatives(alg.prior)
 # OrdinaryDiffEq.alg_adaptive_order(alg::AbstractEK) =
 
 # PI control is the default!

diff --git a/src/algorithms.jl b/src/algorithms.jl
@@ -8,7 +8,7 @@ abstract type AbstractEK <: OrdinaryDiffEq.OrdinaryDiffEqAdaptiveAlgorithm end
           smooth=true,
           prior=IWP(order),
           diffusionmodel=DynamicDiffusion(),
-          initialization=TaylorModeInit(prior.num_derivatives))
+          initialization=TaylorModeInit(num_derivatives(prior)))
 
 **Gaussian ODE filter with zeroth-order vector field linearization.**
 
@@ -25,7 +25,7 @@ which scales cubically with the problem size._
 # Arguments
 - `order::Integer`: Order of the integrated Wiener process (IWP) prior.
 - `smooth::Bool`: Turn smoothing on/off; smoothing is required for dense output.
-- `prior::AbstractODEFilterPrior`: Prior to be used by the ODE filter.
+- `prior::AbstractGaussMarkovProcess`: Prior to be used by the ODE filter.
    By default, uses a 3-times integrated Wiener process prior `IWP(3)`.
    See also: [Priors](@ref).
 - `diffusionmodel::ProbNumDiffEq.AbstractDiffusion`: See [Diffusion models and calibration](@ref).
@@ -49,7 +49,7 @@ EK0(;
     prior=IWP(order),
     diffusionmodel=DynamicDiffusion(),
     smooth=true,
-    initialization=TaylorModeInit(prior.num_derivatives),
+    initialization=TaylorModeInit(num_derivatives(prior)),
 ) = EK0(prior, diffusionmodel, smooth, initialization)
 
 _unwrap_val(::Val{B}) where {B} = B
@@ -60,7 +60,7 @@ _unwrap_val(B) = B
           smooth=true,
           prior=IWP(order),
           diffusionmodel=DynamicDiffusion(),
-          initialization=TaylorModeInit(prior.num_derivatives),
+          initialization=TaylorModeInit(num_derivatives(prior)),
           kwargs...)
 
 **Gaussian ODE filter with first-order vector field linearization.**
@@ -73,7 +73,7 @@ so if you're solving a high-dimensional non-stiff problem you might want to give
 # Arguments
 - `order::Integer`: Order of the integrated Wiener process (IWP) prior.
 - `smooth::Bool`: Turn smoothing on/off; smoothing is required for dense output.
-- `prior::AbstractODEFilterPrior`: Prior to be used by the ODE filter.
+- `prior::AbstractGaussMarkovProcess`: Prior to be used by the ODE filter.
    By default, uses a 3-times integrated Wiener process prior `IWP(3)`.
    See also: [Priors](@ref).
 - `diffusionmodel::ProbNumDiffEq.AbstractDiffusion`: See [Diffusion models and calibration](@ref).
@@ -103,7 +103,7 @@ EK1(;
     prior::PT=IWP(order),
     diffusionmodel::DT=DynamicDiffusion(),
     smooth=true,
-    initialization::IT=TaylorModeInit(prior.num_derivatives),
+    initialization::IT=TaylorModeInit(num_derivatives(prior)),
     chunk_size=Val{0}(),
     autodiff=Val{true}(),
     diff_type=Val{:forward},

diff --git a/src/caches.jl b/src/caches.jl
@@ -95,7 +95,7 @@ function OrdinaryDiffEq.alg_cache(
 
     is_secondorder_ode = f isa DynamicalODEFunction
 
-    q = alg.prior.num_derivatives
+    q = num_derivatives(alg.prior)
     d = is_secondorder_ode ? length(u[1, :]) : length(u)
     D = d * (q + 1)
 
@@ -105,8 +105,11 @@ function OrdinaryDiffEq.alg_cache(
 
     FAC = get_covariance_structure(alg; elType=uElType, d, q)
     if FAC isa IsometricKroneckerCovariance && !(f.mass_matrix isa UniformScaling)
-        error(
-            "The selected algorithm uses an efficient Kronecker-factorized implementation which is incompatible with the provided mass matrix. Try using the `EK1` instead.",
+        throw(
+            ArgumentError(
+                "The selected algorithm uses an efficient Kronecker-factorized " *
+                "implementation which is incompatible with the provided mass matrix. " *
+                "Try using the `EK1` instead."),
         )
     end
 
@@ -119,18 +122,28 @@ function OrdinaryDiffEq.alg_cache(
     SolProj = solution_space_projection(FAC, is_secondorder_ode)
 
     # Prior dynamics
-    prior = if alg.prior isa IWP
-        IWP{uElType}(d, alg.prior.num_derivatives)
-    elseif alg.prior isa IOUP && ismissing(alg.prior.rate_parameter)
-        r = Array{uElType}(calloc, d, d)
-        IOUP{uElType}(d, q, r, alg.prior.update_rate_parameter)
-    elseif alg.prior isa IOUP
-        IOUP{uElType}(d, q, alg.prior.rate_parameter, alg.prior.update_rate_parameter)
-    elseif alg.prior isa Matern
-        Matern{uElType}(d, q, alg.prior.lengthscale)
-    else
-        error("Invalid prior $(alg.prior)")
+    if !(dim(alg.prior) == 1 || dim(alg.prior) == d)
+        throw(
+            DimensionMismatch(
+                "The dimension of the prior is not compatible with the dimension " *
+                "of the problem! The given ODE is $(d)-dimensional, but the prior is " *
+                "$(dim(alg.prior))-dimensional. Please make sure that the dimension of " *
+                "the prior is either 1 or $(d)."),
+        )
+    end
+    prior = remake(alg.prior; elType=uElType, dim=d)
+    if (prior isa IOUP) && prior.update_rate_parameter
+        if !(prior.rate_parameter isa Missing)
+            throw(
+                ArgumentError(
+                    "Do not manually set the `rate_parameter` of the IOUP prior when " *
+                    "using the `update_rate_parameter=true` option." *
+                    "Reset the prior and try again."),
+            )
+        end
+        prior = remake(prior; rate_parameter=Array{uElType}(calloc, d, d))
     end
+
     A, Q, Ah, Qh, P, PI = initialize_transition_matrices(FAC, prior, dt)
     F, L = to_sde(prior)
     F, L = to_factorized_matrix(FAC, F), to_factorized_matrix(FAC, L)
@@ -146,15 +159,8 @@ function OrdinaryDiffEq.alg_cache(
     measurement_model = make_measurement_model(f)
 
     # Initial State
-    initial_variance = ones(uElType, q + 1)
-    μ0 = uElType <: LinearAlgebra.BlasFloat ? Array{uElType}(calloc, D) : zeros(uElType, D)
-    Σ0 = PSDMatrix(
-        to_factorized_matrix(
-            FAC,
-            IsometricKroneckerProduct(d, diagm(sqrt.(initial_variance))),
-        ),
-    )
-    x0 = Gaussian(μ0, Σ0)
+    x0 = initial_distribution(prior)
+    x0 = Gaussian(x0.μ, to_factorized_matrix(FAC, x0.Σ))
 
     # Diffusion Model
     diffmodel = alg.diffusionmodel