Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bayesian differential equations tutorial #68

Merged
merged 10 commits into from
May 28, 2020

Conversation

ChrisRackauckas
Copy link
Contributor

And just to show off, SDEs and DDEs are in there too, so yay Turing is the first!

@Vaibhavdixit02 is going to take it from here.

And just to show off, SDEs and DDEs are in there too, so yay Turing is the first!

@Vaibhavdixit02 is going to take it from here.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

@cpfiffer
Copy link
Member

100% rad, thanks @ChrisRackauckas and @Vaibhavdixit02! Looking forward to this.

@devmotion
Copy link
Member

You should make sure to use truncated instead of Truncated (as in SciML/SciMLBenchmarks.jl#101).

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2020

View / edit / reply to this conversation on ReviewNB

devmotion commented on 2020-05-22T16:56:04Z
----------------------------------------------------------------

I'd suggest to replace  MvNormal(predicted[i], σ[1]*ones(length(predicted[i]))) (and similar lines below) with just MvNormal(predicted[i], σ[1]) . Maybe also use p = (α,β,γ,δ) instead of p = [α,β,γ,δ] ?


ChrisRackauckas commented on 2020-05-22T17:00:41Z
----------------------------------------------------------------

Yup good ideas.

Vaibhavdixit02 commented on 2020-05-23T08:39:13Z
----------------------------------------------------------------

It fails here with "p = (α,β,γ,δ) instead of p = [α,β,γ,δ] " change

Vaibhavdixit02 commented on 2020-05-23T08:47:32Z
----------------------------------------------------------------

solve is not compatible with the tuple, concrete_solve is needed for it to work, so I think to maintain consistency it's better to to stick to array throughout?

ChrisRackauckas commented on 2020-05-23T08:49:37Z
----------------------------------------------------------------

yeah, adjoints need arrays, so it's best to just use arrays for now.

devmotion commented on 2020-05-23T09:54:06Z
----------------------------------------------------------------

Yeah, I was wondering if that would be a problem. Would be nice to support tuples but I guess then it's best to use arrays for now.

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2020

View / edit / reply to this conversation on ReviewNB

devmotion commented on 2020-05-22T16:56:05Z
----------------------------------------------------------------

Same here.


@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2020

View / edit / reply to this conversation on ReviewNB

devmotion commented on 2020-05-22T16:56:05Z
----------------------------------------------------------------

Same here.


@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2020

View / edit / reply to this conversation on ReviewNB

devmotion commented on 2020-05-22T16:56:06Z
----------------------------------------------------------------

Same here.


@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2020

View / edit / reply to this conversation on ReviewNB

devmotion commented on 2020-05-22T16:56:06Z
----------------------------------------------------------------

Remove the multiplication signs and unneeded brackets and write x(t) and y(t) explicitly? Also maybe introduce history function h in a separate step and first write it just as a function of x and y ?


ChrisRackauckas commented on 2020-05-22T17:00:12Z
----------------------------------------------------------------

Agreed

Vaibhavdixit02 commented on 2020-05-23T08:51:05Z
----------------------------------------------------------------

introduce history function h in a separate step and first write it just as a function of x and y

I didn't get this completely, can you write out what you mean

The current version is

$\frac{dx(t)}{dt} = \alpha*h(\alpha,\beta,\gamma,\delta,t-1) - \beta*x*y$

$\frac{dy(t)}{dt} = \delta*x*y - \gamma*y$

$h(\alpha,\beta,\gamma,\delta,t) = [1,1] $

ChrisRackauckas commented on 2020-05-23T08:53:35Z
----------------------------------------------------------------

"h" isn't a thing. "h" is actually just the history here. In psuedo-tex, it's

dxdt = \alpha x(t-1) - \beta x y

There's no reason to ever write "h" in the LaTeX

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2020

View / edit / reply to this conversation on ReviewNB

devmotion commented on 2020-05-22T16:56:07Z
----------------------------------------------------------------

Same here. Also are you sure that everything works as expected, since (some time ago) optimization of DDEs with ForwardDiff was broken? https://github.com/SciML/DelayDiffEq.jl/issues/133


ChrisRackauckas commented on 2020-05-22T17:00:02Z
----------------------------------------------------------------

https://github.com/SciML/DelayDiffEq.jl/blob/master/test/interface/ad.jl passes. I think #133 only is for the case of estimating the delay length itself, but if that's fixed we're good. If it's not fixed, it seems like it works but we're missing a test on it and IIRC that's why we're keeping it open. I've also fixed a lot of things on estimating delays for Pumas, so I wouldn't be surprised if even what we think might not be working is now working.

devmotion commented on 2020-05-22T17:51:47Z
----------------------------------------------------------------

The tests pass since they are marked as broken 😄 But they test estimation of the delay length as well, so maybe we're good without it. Should add some tests without delay length estimation to make sure that's the case.

ChrisRackauckas commented on 2020-05-23T10:03:18Z
----------------------------------------------------------------

FWIW, training in DiffEqFlux seems to work, so I don't know what's going on in those AD tests but we should definitely get it figured out.

Copy link
Contributor Author

https://github.com/SciML/DelayDiffEq.jl/blob/master/test/interface/ad.jl passes. I think #133 only is for the case of estimating the delay length itself, but if that's fixed we're good. If it's not fixed, it seems like it works but we're missing a test on it and IIRC that's why we're keeping it open. I've also fixed a lot of things on estimating delays for Pumas, so I wouldn't be surprised if even what we think might not be working is now working.


View entire conversation on ReviewNB

Copy link
Contributor Author

Agreed


View entire conversation on ReviewNB

Copy link
Contributor Author

Yup good ideas.


View entire conversation on ReviewNB

Copy link
Member

The tests pass since they are marked as broken 😄 But they test estimation of the delay length as well, so maybe we're good without it. Should add some tests without delay length estimation to make sure that's the case.


View entire conversation on ReviewNB

Copy link
Contributor

It fails here with "p = (α,β,γ,δ) instead of p = [α,β,γ,δ] " change


View entire conversation on ReviewNB

Copy link
Contributor

solve is not compatible with the tuple, concrete_solve is needed for it to work, so I think to maintain consistency it's better to to stick to array throughout?


View entire conversation on ReviewNB

Copy link
Contributor Author

yeah, adjoints need arrays, so it's best to just use arrays for now.


View entire conversation on ReviewNB

Copy link
Contributor

introduce history function h in a separate step and first write it just as a function of x and y

I didn't get this completely, can you write out what you mean

The current version is

$\frac{dx(t)}{dt} = \alpha*h(\alpha,\beta,\gamma,\delta,t-1) - \beta*x*y$

$\frac{dy(t)}{dt} = \delta*x*y - \gamma*y$

$h(\alpha,\beta,\gamma,\delta,t) = [1,1] $


View entire conversation on ReviewNB

Copy link
Contributor Author

"h" isn't a thing. "h" is actually just the history here. In psuedo-tex, it's

dxdt = \alpha x(t-1) - \beta x y

There's no reason to ever write "h" in the LaTeX


View entire conversation on ReviewNB

Copy link
Member

Yeah, I was wondering if that would be a problem. Would be nice to support tuples but I guess then it's best to use arrays for now.


View entire conversation on ReviewNB

Copy link
Contributor Author

FWIW, training in DiffEqFlux seems to work, so I don't know what's going on in those AD tests but we should definitely get it figured out.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 23, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-23T13:58:16Z
----------------------------------------------------------------

no *


@review-notebook-app
Copy link

review-notebook-app bot commented May 23, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-23T14:06:13Z
----------------------------------------------------------------

The sensitivity equation didn't show.


@review-notebook-app
Copy link

review-notebook-app bot commented May 23, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-23T14:06:13Z
----------------------------------------------------------------

This fit seems far off: lower the tolerances?


Vaibhavdixit02 commented on 2020-05-23T14:29:06Z
----------------------------------------------------------------

This is pretty slow, 14 minutes for 100 iterations. It needs to run for much more but it's so slow?

devmotion commented on 2020-05-23T14:33:12Z
----------------------------------------------------------------

Shouldn't it be just σ instead of σ[1] ?

devmotion commented on 2020-05-23T14:37:54Z
----------------------------------------------------------------

You should be able to gain a speed up by using multiple threads and writing Threads.@threads for i in 1:length(predictions) but it shouldn't be that slow, so it's not really a proper fix.

ChrisRackauckas commented on 2020-05-23T14:38:09Z
----------------------------------------------------------------

yes

ChrisRackauckas commented on 2020-05-23T14:39:46Z
----------------------------------------------------------------

it should be σ. ReverseDiff should be much slower than ForwardDiff through the solver, but not that much slower? Are you on master? Add a print statement to make sure it's using the InterpolatingAdjoint

ChrisRackauckas commented on 2020-05-23T14:40:02Z
----------------------------------------------------------------

Recent DiffEqBase release added this (mostly for this tutorial)

devmotion commented on 2020-05-23T14:40:14Z
----------------------------------------------------------------

BTW you don't use T anywhere, so you can just get rid of the input argument.

@review-notebook-app
Copy link

review-notebook-app bot commented May 23, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-23T14:06:14Z
----------------------------------------------------------------

This fit is off too


Vaibhavdixit02 commented on 2020-05-23T14:29:26Z
----------------------------------------------------------------

18 minutes for 100 iterations..

@review-notebook-app
Copy link

review-notebook-app bot commented May 23, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-23T14:06:15Z
----------------------------------------------------------------

The * here again. Also, I don't know where the d(x) convention is from, but people usually just do dx and dt here.


Copy link
Contributor Author

Recent DiffEqBase release added this (mostly for this tutorial)


View entire conversation on ReviewNB

Copy link
Member

BTW you don't use T anywhere, so you can just get rid of the input argument.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 24, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-24T19:28:15Z
----------------------------------------------------------------

dW_1 and dW_2


@review-notebook-app
Copy link

review-notebook-app bot commented May 24, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-24T19:28:15Z
----------------------------------------------------------------

Can this be ran longer? It'll take a bit, but it's right.


Vaibhavdixit02 commented on 2020-05-24T19:31:14Z
----------------------------------------------------------------

1000? Yeah, I'll leave it overnight now :)

devmotion commented on 2020-05-25T22:15:23Z
----------------------------------------------------------------

A quick inspection reveals many type inference problems:

varinfo = Turing.VarInfo(model)
spl = Turing.SampleFromPrior()

@code_warntype model.f(Random.GLOBAL_RNG, model, varinfo, spl, Turing.DefaultContext())

In particular, the type of prob can't be inferred (also not for the other models above), and the solution types are unknown as well, leading to the problem that not even length(predictions) is inferred as Int. A possible fix (maybe there's a better one), is to pass prob as input to the model and use remake . Then the type of the SDE problem is inferred, but the types of predictions etc are still unknown (it seems there are some inference problems in the ODE and SDE solvers?). The problem of length(predictions) can be avoided by obtaining it from data instead (BTW intuitively it doesn't seem helpful to extract the columns and hence allocate (also in the other models) in every run, but I couldn't notice a significant speed-up by using the DiffEqArray directly). Similar to the implementation in DiffEqBayes one should make sure though that SDE solver actually returns a correct, i.e., complete, solution (maybe it could be useful to specify maxiters as well to avoid excessively long simulations).

But yeah, also here the problem seems to be that even if you manage to sample the chains are completely stuck. Did it improve at all with more samples?

devmotion commented on 2020-05-25T22:58:08Z
----------------------------------------------------------------

BTW I tried Gibbs sampling with MH

chain = sample(model, Gibbs(MH(:σ), MH(:α), MH(:β), MH(:γ), MH(:δ), MH(:ϕ1), MH(:ϕ2)), 1000)

and got

Sampling: 100%|█████████████████████████████████████████| Time: 0:05:24
Object of type Chains, with data of type 1000×8×1 Array{Float64,3}

Iterations = 1:1000
Thinning interval = 1
Chains = 1
Samples per chain = 1000
internals = lp
parameters = α, β, γ, δ, σ, ϕ1, ϕ2

2-element Array{ChainDataFrame,1}

Summary Statistics
parameters mean std naive_se mcse ess r_hat
────────── ────── ────── ──────── ────── ─────── ──────
α 1.7659 0.0530 0.0017 0.0134 9.6869 1.1118
β 1.1488 0.0316 0.0010 0.0086 10.9606 1.0014
γ 2.9270 0.1047 0.0033 0.0144 70.0644 1.0178
δ 0.8957 0.0265 0.0008 0.0073 7.2672 1.2330
σ 0.3335 0.1291 0.0041 0.0283 5.5755 1.0888
ϕ1 0.3963 0.0859 0.0027 0.0255 4.0161 2.3181
ϕ2 0.2298 0.0771 0.0024 0.0223 4.0161 1.8136

Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
────────── ────── ────── ────── ────── ──────
α 1.6538 1.7449 1.7597 1.8053 1.8669
β 1.0849 1.1393 1.1393 1.1735 1.1953
γ 2.7010 2.8738 2.9242 2.9965 3.1636
δ 0.8485 0.8691 0.9006 0.9138 0.9393
σ 0.2870 0.2964 0.3053 0.3053 0.6094
ϕ1 0.2659 0.3201 0.4216 0.4611 0.5142
ϕ2 0.1128 0.1720 0.1955 0.2876 0.3672

which is not completely great but somewhat close to the true parameter values at least.

@review-notebook-app
Copy link

review-notebook-app bot commented May 24, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-24T19:28:16Z
----------------------------------------------------------------

let's just remove DDEs until the AD bug is fixed.


Vaibhavdixit02 commented on 2020-05-24T19:33:21Z
----------------------------------------------------------------

I hadn't paid attention to the estimate lol

Copy link
Contributor

1000? Yeah, I'll leave it overnight now :)


View entire conversation on ReviewNB

Copy link
Contributor

I hadn't paid attention to the estimate lol


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 25, 2020

View / edit / reply to this conversation on ReviewNB

ChrisRackauckas commented on 2020-05-25T19:15:07Z
----------------------------------------------------------------

what happened here?


devmotion commented on 2020-05-25T21:53:06Z
----------------------------------------------------------------

I was wondering the same... It seems the chain gets stuck since all proposals are rejected (basically all samples after iteration 8000 it seems).

Vaibhavdixit02 commented on 2020-05-26T05:42:00Z
----------------------------------------------------------------

I don't know at this point. It happened with the Turing update to v0.13.0.

ChrisRackauckas commented on 2020-05-26T13:39:46Z
----------------------------------------------------------------

This is worth having someone dig into then... that doesn't look good at all.

Copy link
Contributor Author

Are there plots of the coefficients, chains, and the fits to go along with the results?

Copy link
Member

I was wondering the same... It seems the chain gets stuck since all proposals are rejected (basically all samples after iteration 8000 it seems).


View entire conversation on ReviewNB

Copy link
Member

A quick inspection reveals many type inference problems:

varinfo = Turing.VarInfo(model)
spl = Turing.SampleFromPrior()

@code_warntype model.f(Random.GLOBAL_RNG, model, varinfo, spl, Turing.DefaultContext())

In particular, the type of prob can't be inferred (also not for the other models above), and the solution types are unknown as well, leading to the problem that not even length(predictions) is inferred as Int. A possible fix (maybe there's a better one), is to pass prob as input to the model and use remake . Then the type of the SDE problem is inferred, but the types of predictions etc are still unknown (it seems there are some inference problems in the ODE and SDE solvers?). The problem of length(predictions) can be avoided by obtaining it from data instead (BTW intuitively it doesn't seem helpful to extract the columns and hence allocate (also in the other models) in every run, but I couldn't notice a significant speed-up by using the DiffEqArray directly). Similar to the implementation in DiffEqBayes one should make sure though that SDE solver actually returns a correct, i.e., complete, solution (maybe it could be useful to specify maxiters as well to avoid excessively long simulations).

But yeah, also here the problem seems to be that even if you manage to sample that the chains are completely stuck. Did it improve at all with more samples?


View entire conversation on ReviewNB

Copy link
Member

BTW I tried Gibbs sampling with MH

chain = sample(model, Gibbs(MH(:σ), MH(:α), MH(:β), MH(:γ), MH(:δ), MH(:ϕ1), MH(:ϕ2)), 1000)

and got

Sampling: 100%|█████████████████████████████████████████| Time: 0:05:24
Object of type Chains, with data of type 1000×8×1 Array{Float64,3}

Iterations = 1:1000
Thinning interval = 1
Chains = 1
Samples per chain = 1000
internals = lp
parameters = α, β, γ, δ, σ, ϕ1, ϕ2

2-element Array{ChainDataFrame,1}

Summary Statistics
parameters mean std naive_se mcse ess r_hat
────────── ────── ────── ──────── ────── ─────── ──────
α 1.7659 0.0530 0.0017 0.0134 9.6869 1.1118
β 1.1488 0.0316 0.0010 0.0086 10.9606 1.0014
γ 2.9270 0.1047 0.0033 0.0144 70.0644 1.0178
δ 0.8957 0.0265 0.0008 0.0073 7.2672 1.2330
σ 0.3335 0.1291 0.0041 0.0283 5.5755 1.0888
ϕ1 0.3963 0.0859 0.0027 0.0255 4.0161 2.3181
ϕ2 0.2298 0.0771 0.0024 0.0223 4.0161 1.8136

Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
────────── ────── ────── ────── ────── ──────
α 1.6538 1.7449 1.7597 1.8053 1.8669
β 1.0849 1.1393 1.1393 1.1735 1.1953
γ 2.7010 2.8738 2.9242 2.9965 3.1636
δ 0.8485 0.8691 0.9006 0.9138 0.9393
σ 0.2870 0.2964 0.3053 0.3053 0.6094
ϕ1 0.2659 0.3201 0.4216 0.4611 0.5142
ϕ2 0.1128 0.1720 0.1955 0.2876 0.3672

which is not completely great but somewhat close to the true parameter values at least.


View entire conversation on ReviewNB

Copy link
Contributor

I don't know at this point. It happened with the Turing update to v0.13.0.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 26, 2020

View / edit / reply to this conversation on ReviewNB

Vaibhavdixit02 commented on 2020-05-26T05:55:11Z
----------------------------------------------------------------

>Did it improve at all with more samples?

Yeah, with 500 it is better.


ChrisRackauckas commented on 2020-05-26T13:38:59Z
----------------------------------------------------------------

ehh good enough to ship IMO. Just mention that SDEs are a lot more computationally intensive and so for very good estimates one should run it with more trajectories and more samples

ChrisRackauckas commented on 2020-05-26T13:40:22Z
----------------------------------------------------------------

Or, how about doing an MAP method and then starting the chains at the MAP?

Vaibhavdixit02 commented on 2020-05-26T17:21:58Z
----------------------------------------------------------------

MAP doesn't give me an answer, errors out after a lot of instability warnings.

Copy link
Contributor Author

ehh good enough to ship IMO. Just mention that SDEs are a lot more computationally intensive and so for very good estimates one should run it with more trajectories and more samples


View entire conversation on ReviewNB

Copy link
Contributor Author

This is worth having someone dig into then... that doesn't look good at all.


View entire conversation on ReviewNB

Copy link
Contributor Author

Or, how about doing an MAP method and then starting the chains at the MAP?


View entire conversation on ReviewNB

Copy link
Contributor

MAP doesn't give me an answer, errors out after a lot of instability warnings.


View entire conversation on ReviewNB

@devmotion
Copy link
Member

Unrelated to the issues discussed above, when playing around with the draft yesterday I noticed that Manifest.toml should be updated and completed.

@review-notebook-app
Copy link

review-notebook-app bot commented May 27, 2020

View / edit / reply to this conversation on ReviewNB

Vaibhavdixit02 commented on 2020-05-27T19:07:58Z
----------------------------------------------------------------

There is something going on with the step size, for other values of step size (I get 0.0125, 0.05, 0.02) than 0.025 the estimates are far off and give a lot of

Warning: The current proposal will be rejected due to numerical error(s).

with 0.025 it works well and the estimates are accurate.

@ChrisRackauckas ChrisRackauckas changed the title [WIP] Add Bayesian differential equations tutorial Add Bayesian differential equations tutorial May 27, 2020
@ChrisRackauckas
Copy link
Contributor Author

Thanks @Vaibhavdixit02 for taking it over the finish line. I think it can get continually improved, and the Turing developers might want to look at that issue with the stepsizes, but IMO it's good enough to release.

@cpfiffer
Copy link
Member

I'm happy to merge -- anyone have further comments? If not, I'll plan on merging tomorrow morning (PST) and getting everything ready to go to get this on turing.ml.

Thanks, you guys! Spectacular work.

@cpfiffer cpfiffer merged commit 3e20742 into TuringLang:master May 28, 2020
@ChrisRackauckas ChrisRackauckas deleted the bayesiandiffeq branch May 30, 2020 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants