Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
TorkelE committed Nov 2, 2023
1 parent 2c3f5b1 commit aa01fa6
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 17 deletions.
2 changes: 1 addition & 1 deletion docs/src/catalyst_applications/advanced_simulations.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Catalyst uses for all simulations.



## Monte Carlo simulations using `EnsembleProblem`s
## [Monte Carlo simulations using `EnsembleProblem`s](@id advanced_simulations_ensemble_problems)
In many contexts one needs to run multiple simulations of a model, for example
to collect statistics of SDE or jump process solutions, or to systematically
vary parameter values within a model. While it is always possible to manually
Expand Down
86 changes: 70 additions & 16 deletions docs/src/catalyst_applications/optimization_ode_param_fitting.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# [Parameter Fitting for ODEs using SciML/Optimization.jl and DiffEqParamEstim.jl](@id optimization_parameter_fitting)
Fitting parameters to data involves solving a optimisation problem (that is, finding the parameter set that optimally fits you model to your data, typically by minimising some cost function). The SciML ecosystem's primary package for solving optimisation problems is [Optimization.jl](https://github.com/SciML/Optimization.jl). While it does not implement any optimisation methods itself, it wraps the large number of optimisation methods that have been implemented in Julia in a single common interface.
Fitting parameters to data involves solving an optimisation problem (that is, finding the parameter set that optimally fits you model to your data, typically by minimising a cost function). The SciML ecosystem's primary package for solving optimisation problems is [Optimization.jl](https://github.com/SciML/Optimization.jl). While it does not implement any optimisation methods itself, it wraps the large number of optimisation methods that have been implemented in Julia into a single common interface.

This tutorial both demonstrate how to use first create a parameter fitting cost function using the [DiffEqParamEstim.jl](https://github.com/SciML/DiffEqParamEstim.jl) package, and how to use Optimization.jl to optimise this (and potentially other) functions. More details on how to use these packages can be found in their [respective](https://docs.sciml.ai/Optimization/stable/) [documentations](https://docs.sciml.ai/DiffEqParamEstim/stable/).
This tutorial both demonstrate how to create parameter fitting cost functions using the [DiffEqParamEstim.jl](https://github.com/SciML/DiffEqParamEstim.jl) package, and how to use Optimization.jl to minimise these. Optimization.jl can also be used in other contexts, such as finding parameter sets that maximises the magnitude of some system behaviour. More details on how to use these packages can be found in their [respective](https://docs.sciml.ai/Optimization/stable/) [documentations](https://docs.sciml.ai/DiffEqParamEstim/stable/).

## Basic example

Let us consider a simple catalysis network, where an enzyme (*E*) turns a substrate (*S*) into a product (*P*):
```@example diffeq_param_estim_1
using Catalyst, PEtab
using Catalyst
rn = @reaction_network begin
kB, S + E --> SE
kD, SE --> S + E
Expand All @@ -34,7 +34,7 @@ plot(true_sol; idxs=:P, label="True solution", lw=8)
plot!(data_ts, data_vals; label="Measurements", seriestype=:scatter, ms=6)
```

Next, we will use DiffEqParamEstim to build a loss function, describing how well simulations of our model fit the data.
Next, we will use DiffEqParamEstim.jl to build a loss function that describing how well our model's simulations fit the data.
```@example diffeq_param_estim_1
using DiffEqParamEstim, Optimization
p_dummy = [:kB => 0.0, :kD => 0.0, :kP => 0.0]
Expand All @@ -45,45 +45,99 @@ nothing #
To `build_loss_objective` we provide the following arguments:
- `oprob`: The `ODEProblem` with which we simulate our model (using some dummy parameter values, since we do not know these).
- `Tsit5()`: The numeric integrator we wish to simulate our model with.
- `L2Loss(data_ts, data_vals)`: Defines the loss function. While [other alternatives](https://docs.sciml.ai/DiffEqParamEstim/stable/getting_started/#Alternative-Cost-Functions-for-Increased-Robustness) are available, `L2Loss` is the simplest one. Its first argument is the time points at which the data is collected, and the second is the data's values.
- `L2Loss(data_ts, data_vals)`: Defines the loss function. While [other alternatives](https://docs.sciml.ai/DiffEqParamEstim/stable/getting_started/#Alternative-Cost-Functions-for-Increased-Robustness) are available, `L2Loss` is the simplest one (measuring the sum of squared distances between model simulations and data measurements). Its first argument is the time points at which the data is collected, and the second is the data's values.
- `Optimization.AutoForwardDiff()`: Our choice of [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation) framework.

Furthermore, we can pass any number of additional optional arguments, these are then passed to the internal `solve()` function (that are used to simulate our ODE). Here we provide the following additional arguments:
- `maxiters=10000`: If the ODE integrator takes a very large number of steps, that s typically a sign of a very poor fit. Reducing the `maxiters` threshold reduces the time we waste on evaluating such models.
- `verbose=false`: The simulation of models with very unsuitable parameter sets typically generate warnings. To avoid an overflow of such (here unnecessary) warnings as we evaluate a large number of parameter sets, we turn warning off.
- `save_idxs=4`: The produce (*P*) is the 4th species in our species vector. To ensure that the concentration of the right species is evaluated against the data, we set the numeric integrator to only save teh value of this species.
- `maxiters=10000`: If the ODE integrator takes a very large number of steps, that is typically the sign of a very poor fit. Reducing the `maxiters` threshold reduces the time we waste on evaluating such models.
- `verbose=false`: The simulation of models with highly unsuitable parameter sets typically generate various warnings (such as simulation terminations due to reaching the `maxiters` value). To avoid an overflow of such (here unnecessary) warnings as we evaluate a large number of parameter sets, we turn warnings off.
- `save_idxs=4`: The measured species (*P*) is the 4th species in our species vector (`species(rn)`). To ensure that the concentration of the right species is evaluated against the data, we set the numeric integrator to only save the value of this species.

Now we can create an `OptimizationProblem` using our `loss_function` and some initial guess of parameter values from which the optimizer will start:
Now we can create an `OptimizationProblem` using our `loss_function` and some initial guess of parameter values from which the optimiser will start:
```@example diffeq_param_estim_1
optprob = OptimizationProblem(loss_function, [1.0, 1.0, 1.0])
nothing #
```

!!! note
`OptimizationProblem` cannot currently accept parameter values in the form of a map. These must be provided as individual values (using the same order as the parameters occur in in the `parameters(rs)` vector).
`OptimizationProblem` cannot currently accept parameter values in the form of a map (e.g. `[:kB => 1.0, :kD => 1.0, :kP => 1.0]`). These must be provided as individual values (using the same order as the parameters occur in in the `parameters(rs)` vector).

Finally, we can use optimise `optprob` to find the parameter set that best fits our data. Optimization.jl does not provide any optimisation methods by default. Instead, for each supported optimisation package, it provides a corresponding wrapper-package to import that optimisation package for using with Optimization. E.g., if we wish to [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl)'s [Nelder-Mead](https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method) method, we most install and import the OptimizationOptimJL package. A summary of all, by Optimization.jl, supported optimisation packages can be found [here](https://docs.sciml.ai/Optimization/stable/#Overview-of-the-Optimizers). Here, we import the Optim.jl package and uses it to minimise our cost function (finding a parameter set that fits the data):
Finally, we can optimise `optprob` to find the parameter set that best fits our data. Optimization.jl does not provide any optimisation methods by default. Instead, for each supported optimisation package, it provides a corresponding wrapper-package to import that optimisation package for using with Optimization. E.g., if we wish to [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl)'s [Nelder-Mead](https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method) method, we must install and import the OptimizationOptimJL package. A summary of all, by Optimization.jl, supported optimisation packages can be found [here](https://docs.sciml.ai/Optimization/stable/#Overview-of-the-Optimizers). Here, we import the Optim.jl package and uses it to minimise our cost function (thus finding a parameter set that fits the data):
```@example diffeq_param_estim_1
using OptimizationOptimJL
sol = solve(optprob, NelderMead())
optsol = solve(optprob, NelderMead())
```

We can now simulate our model for the corresponding parameter set, checking that it fits our data.
```@example diffeq_param_estim_1
oprob_fitted = remake(oprob; p=sol.u)
fitted_sol = solve(oprob, Tsit5())
oprob_fitted = remake(oprob; p=optsol.u)
fitted_sol = solve(oprob_fitted, Tsit5())
plot!(fitted_sol; idxs=:P, label="Fitted solution", linestyle=:dash, lw=6, color=:purple)
```

!!! note
Here, a good exercise is to check the resulting parameter set and note that, while it creates a good fit to the data, it does not actually correspond to the original parameter set. [Identifiability](https://www.sciencedirect.com/science/article/pii/S1364815218307278) is a concept that studies how to deal with this problem.

Say that we instead would like to use the [CMA Evolution Strategy](https://en.wikipedia.org/wiki/CMA-ES) method, as implemented by the [CMAEvolutionStrategy.jl](https://github.com/jbrea/CMAEvolutionStrategy.jl) package, we would instead run
Say that we instead would like to use the [CMA Evolution Strategy](https://en.wikipedia.org/wiki/CMA-ES) method, as implemented by the [CMAEvolutionStrategy.jl](https://github.com/jbrea/CMAEvolutionStrategy.jl) package. In this case we would run:
```@example diffeq_param_estim_1
using OptimizationCMAEvolutionStrategy
sol = solve(optprob, CMAEvolutionStrategyOpt())
```

## Optimisation problems with data for multiple species
Imagine that, in our previous example, we had measurements the concentration of both *S* and *P*:
```@example diffeq_param_estim_1
data_vals_S = (0.8 .+ 0.4*rand(10)) .* data_sol[:S][2:end]
data_vals_P = (0.8 .+ 0.4*rand(10)) .* data_sol[:P][2:end]
plot(true_sol; idxs=[:S, :P], label=["True S" "True P"], lw=8)
plot!(data_ts, data_vals_S; label="Measured S", seriestype=:scatter, ms=6, color=:blue)
plot!(data_ts, data_vals_P; label="Measured P", seriestype=:scatter, ms=6, color =:red)
```

In this case we would have to use the `L2Loss(data_ts, hcat(data_vals_S, data_vals_P))` and `save_idxs=[1,4]` arguments in `loss_function`:
```@example diffeq_param_estim_1
loss_function_S_P = build_loss_objective(oprob, Tsit5(), L2Loss(data_ts, hcat(data_vals_S, data_vals_P)), Optimization.AutoForwardDiff(); maxiters=10000, verbose=false, save_idxs=4)
nothing # hide
```

We can now fit our model to data and plot the results:
```@example diffeq_param_estim_1
optprob_S_P = OptimizationProblem(loss_function_S_P, [1.0, 0.1, 0.5])
oprob_fitted_S_P = remake(oprob; p=optprob_S_P.u)
fitted_sol_S_P = solve(oprob_fitted_S_P, Tsit5())
plot!(fitted_sol_S_P; idxs=[:S, :P], label="Fitted solution", linestyle=:dash, lw=6, color=[:lightblue :pink])
```

## Setting parameter constraints and boundaries
Sometimes, it is desired to set boundaries on parameter values. Indeed, this can speed up the optimisation process (by preventing searching through unfeasible parts of parameter space) or be a prerequisite of some optimisation methods. This can be done by passing the `lb` (lower bounds) and `up` (upper bounds) arguments to `OptimizationProblem`. These are vectors (of the same length as the number of parameters), with each argument corresponding to the boundary value of the parameter with the same index (as used in the `parameters(rn)` vector). If we wish to constrain each parameter to the interval *(0.1, 10.0)* this can be done through:
```@example diffeq_param_estim_1
optprob = OptimizationProblem(loss_function, [1.0, 1.0, 1.0]; lb = [0.1, 0.1, 0.1], up = [10.0, 10.0, 10.0])
```

In addition to boundaries, Optimization.jl also supports setting [linear and non-linear constraints](https://docs.sciml.ai/Optimization/stable/tutorials/constraints/#constraints) on its output solution.

## Parameter fitting with known parameters


## Fitting parameters on the logarithmic scale
Often it can be advantageous to fit parameters on a [logarithmic, rather than linear, scale](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008646). The best way to proceed is to simply replace each parameters in the model definition by its logarithmic version:
```@example diffeq_param_estim_2
using Catalyst, PEtab
rn = @reaction_network begin
10^kB, S + E --> SE
10^kD, SE --> S + E
10^kP, SE --> P + E
end
```
And then proceeding, by keeping in mind that parameter values are logarithmic. Here, setting
```@example diffeq_param_estim_2
p_true = [:kB => 0.0, :kD => -1.0, :kP => 10^(0.5)]
```
corresponds to the same true parameter values as used previously (`[:kB => 1.0, :kD => 0.1, :kP => 0.5]`).


## Parameter fitting to multiple experiments
Say that we had measured our model for several different initial conditions, and would like to fit our model to all these measurements simultaneously. This can be done by first creating a [corresponding `EnsembleProblem`](@ref advanced_simulations_ensemble_problems). How to then create loss functions for these are described in more detail [here](https://docs.sciml.ai/DiffEqParamEstim/stable/tutorials/ensemble/).


## Setting parameter constraints and boundaries

0 comments on commit aa01fa6

Please sign in to comment.