-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward sensitivity and adjoints for DDEs #281
Comments
http://www.naturalspublishing.com/files/published/4av943i134gz84.pdf is a good resource. |
I would have a first implementation of an interpolating adjoint method for DDEs with constant delays. The code can be found here: However, I am sure the structure and efficiency could be improved a lot. Also, I am new to julia and DiffEqSensitivity and I am not sure how this would fit best into the existing sensitivity framework. As described on the github repo, there is also quite some difference between my sensitivity method and the AD method as soon as there are any delays. So I am not sure, whether this is still a bug in my code or because of the issues mentioned above. |
@ChrisRackauckas, @devmotion Apparently, performance and memory usage of my approach become terrible for increasing number of sample times. I assume this is because of the adjoint state history which I constructed as a dictionary of dense solutions of previous steps. And as I saw the solution itself contains the initial history. So I am essentially storing all solutions. Any suggestions how to do this on a lower level and more efficiently? The second point which I am not sure about are the discontinuities. Currently I account for discontinuities in the adjoint state through passing them to the solver by constant_lags. I assume it would be better to do this directly, but I am struggling a bit to find the related code in the library.. Would it be an option to use DiscreteCallbacks for C⁰ and C¹ discontinuities and to use a single solve(...) command? This would require, that the method of steps can handle jumps in the history function. |
Yes, as mentioned in the initial discussion in DelayDiffEq, IMO we should just use callbacks. The solver handles discontinuities fine, a common example in the tests is actually a discontinuity at the initial time point which is then propagated to discontinuities in higher order derivatives (or derivatives of the same order in the case of neutral DDEs) which have to be considered by the solver as well. |
Ok, great. This should simplify things a lot and then we can also add discontinuities coming from non-smoothness at the initial point. |
Adding the callbacks improved things a lot :D The code looks much simpler and runtime and memory usage are better than for ReverseDiffAdjoint() in my example. I pushed the code onto my repo. Currently I am ignoring the discontinuities due to the initial condition, but it shouldn't be hard to add them. Also I feel like my adjoint function has already a similar form as adjoint_sensitivities(sol,alg,dg,ts;sensealg=InterpolatingAdjoint()), but I would need some guidance to properly include it into the DiffEqSensitivity framework :) |
So what needs to be done to get this into the DiffEqSensitivity framework? The dispatches currently live here: |
Thanks for the reference to the code and sorry for my late reply. I am currently adding some things such as a continuous loss with delays. I will give it a try to include it as soon as I am done and find some time. Would you prefer a similar struct and function as for ODEInterpolatingAdjointSensitivityFunction just for DDEs (I did not look at SDDEs)? |
I think that might be what's needed. Don't worry about SDDEs: I think from the SDE adjoint work it's clear that "SDDEs should just work", but it'll probably take like 5 years before someone has a clear proof that it actually does work, so you might as well just make sure the current SDE handling code works and let us address it in the future. |
Ok, yeah makes sense. However, I am still not sure whether the DDE adjoint is actually correct. In my tests, there is an average mismatch compared to the ReverseDiffAdjoint of 1-2%. In some gradient components it is sometimes much larger. And my test case does not include any parameter dependent discontinuities. Without delays the mismatch is much smaller (approx. 0.1%). Also for the discontinuities I am currently using the following two callbacks:
Where cb1 is for the adjoint jumps in the presence of a discrete loss and cb2 is supposed to handle the other discontinuities. Is it enough to do |
I'm sorry, I can't focus on this right now and I would have to look up the adjoint equations first. However, in my opinion, the major question when implementing the forward sensitivities equations is what would be a good interface and API for all the derivatives that are required here. As soon as one has decided on an API for all the delayed Jacobians, derivatives, Jacobians, and parameter derivatives of the delayed arguments that appear in the extended system as shown in eq 5 in the SIAM paper linked above (they are not part of the standard DiffEqFunctions API), the implementation of the extended DDE system should hopefully be quite straightforward. |
Thanks for your answer and sorry for my late one. Well, for the gradients I just defined the functions,
and then calculated the VJP in the adjoint DDE function. The same could also be used for forward sensitivities. |
My point was that it is not possible to specify the Jacobians manually 🤷 Similar to the ODE case, there should be an interface that allows users to specify all these required derivatives and Jacobians, and would use AD in case they are not provided. We don't want to hardcode AD, and in particular not of a particular AD system (your code always uses Zygote it seems). |
Ok, I see. Thanks. Yes, as I am working with neural networks I did not really need that, but it shouldn't be a problem to add it. I am currently busy, but I'll try to make it ready for a PR once I submitted my master thesis. Another issue I have been thinking about is concerning efficiency: A general form of the DDE adjoint method looks like this: |
It's not a good idea to do that, even on ODEs. It's numerically unstable to not have the error control on the integral term. That doesn't mean there aren't random equations where it will work, but there are counter examples where it completely fails (purely time-dependent equations) and there are examples where you get poor optimization behavior due to the gradient inaccuracies (pretty much any non-neural network example, this is actually one of the well-described issues in continuous sensitivity analysis so there's about a hundred papers to this effect). Given the known issues with this, it should not ever be made a default, and instead a tutorial should mention how it could be done and caution the user as to the possible effects one might see (just like BacksolveAdjoint).
|
It is trivial if there exists a reasonable design - it is something that has to be addressed in DiffEqBase in the
Are you looking for |
I think I already made some optimizations here. At least it was discussed in some PR and might be part of some issue. |
Thanks a lot for pointing this out! Then I'll better forget about this^^
I used
OK I see. So this means we should be able to add the jacobians directly to the DDEFunction, right? |
Yes, exactly, the Jacobians and all other terms required in the sensitivity equations. Edit: See also SciML/DelayDiffEq.jl#138. |
Indeed there is already an issue: SciML/OrdinaryDiffEq.jl#335 |
OK, great. Thanks for the link to the issue. But in this case it might be better to go with the QuadratureAdjoint until this is solved. i.e. calculating a dense adjoint and then integrating it using QuadGK.jl |
I just had a closer look at your code (for the first time, I have to admit), and unfortunately I think the implementation only works in your special case (at most). It seems it mainly exploits the fact that you define your DDE explicitly as In general, I am not sure yet what the best solution is to this problem. Somehow to me it feels like it would be helpful for the sensitivity analysis to enforce more structure in, e.g., the arguments of If users specify the delayed Jacobians manually, it could be sufficient to encode more information in how delays are specified, e.g., by explicitly specifying the order of derivatives. Then the delayed Jacobians could be specified as a tuple (or array) of functions in some fixed order (e.g., first element the regular Jacobian wrt |
Maybe the cleanest approach would be to add some special types of In general, it might also be useful to perform symbolic calculations with ModelingToolkit (or to obtain more information about the mathematical structure), I guess. |
Thanks a lot for looking into it. The current version of the code on github is a bit outdated and I added some minor changes like quadrature adjoint and a continuous loss in the meantime. But yeah I see the problems. I explicitly use that definition of f in order to calculate the jacobians and the adjoint code that I implemented so far will only work for constant delays and no dependency on derivatives of u. And actually for this case we could also use ReverseDiffAdjoint which was faster in my experiments^^. Also I have been assuming that the history doesn't depend on p, but that should be trivial to add. I think I came across the formulas for state dependent delays and neutral type equations somewhere but would have to look it up.
Yes that makes sense. But then to begin it might be the best to enforce a manual definition of the jacobians? Although this is a bit annoying in the case I used it for.
That would be great. But I am not sure how we would have to do this such that it is still compatible with the DDE solver. Maybe one could say that the solver if provided with a DDEFunction of this special type, then internally defines
What do you exactly mean by this? |
For simple cases we can just differentiate through the DDE solver but if the DDE system contains parameter-dependent C1-discontinuities the forward sensitivities have jump discontinuities which, e.g., ForwardDiff can't deal with (see SciML/DelayDiffEq.jl#183 and https://epubs.siam.org/doi/abs/10.1137/100814949).
Hence it would be great if we could add forward sensitivity equations and adjoint methods for DDEs. I would like to help with getting them in here, I've just never worked on DiffEqSensitivity and hence might need some guidance and/or time to get started. I would assume that (hopefully) one can exploit the existing implementations for ODEs.
The text was updated successfully, but these errors were encountered: