AD: change the backend transparently #25

jbcaillau · 2023-02-09T21:55:22Z

we currently use ForwardDiff but should be able to move transparently (= without any user code change) to another backend
use the backend choice as a default / global behaviour, not hard coded
as soon as sth more efficient than dual numbers (Enzyme, e.g.) is available, switch

ocots · 2023-03-28T19:45:13Z

This is more a discussion than an issue, no? Should be transfered elsewhere?

jbcaillau · 2023-03-28T22:24:19Z

It is an issue. Now in CTBase.jl, FWIW

ocots · 2023-05-09T19:07:12Z

I am not convinced that this is an issue. It is more a wish :-)

jbcaillau · 2023-05-10T12:22:27Z

OK, move it!

jbcaillau · 2023-07-18T14:11:40Z

We should move from ForwardDiff to AbstractDifferentiation

jbcaillau · 2023-10-23T06:19:27Z

See also: FastDifferentiation.jl

ocots · 2024-05-07T08:38:56Z

Check also DifferentiationInterface.jl.

gdalle · 2024-05-28T08:04:31Z

Friendly ping from the creator of DifferentiationInterface: I'm available to help you make the transition if you want me to :)

ocots · 2024-05-29T06:47:05Z

Hi @gdalle! This would be great. Thanks. I propose first to post here how we use AD. @jbcaillau and @PierreMartinon, please complete.

We have defined some auxiliary functions that use ForwardDiff.jl package:

CTBase.jl/src/utils.jl

Line 76 in b73f56d

function ctgradient(f::Function, x::ctNumber)

function ctgradient(f::Function, x::ctNumber)
    return ForwardDiff.derivative(x -> f(x), x)
end

CTBase.jl/src/utils.jl

Line 101 in b73f56d

function ctjacobian(f::Function, x::ctNumber)

function ctjacobian(f::Function, x::ctNumber) 
    return ForwardDiff.jacobian(x -> f(x[1]), [x])
end

We use these auxiliary functions for differential geometry in CTBase.jl package:

CTBase.jl/src/differential_geometry.jl

Line 95 in b73f56d

    
           function ⋅(X::VectorField{Autonomous, <: VariableDependence}, f::Function)::Function

We use these auxiliary functions also in the CTFlows.jl package:

https://github.com/control-toolbox/CTFlows.jl/blob/fff879627ccec8d3252694ae2ad27252522d676f/src/hamiltonian.jl#L61

function rhs(h::AbstractHamiltonian)
    function rhs!(dz::DCoTangent, z::CoTangent, v::Variable, t::Time)
        n      = size(z, 1) ÷ 2
        foo(z) = h(t, z[rg(1,n)], z[rg(n+1,2n)], v)
        dh     = ctgradient(foo, z)
        dz[1:n]    =  dh[n+1:2n]
        dz[n+1:2n] = -dh[1:n]
    end
    return rhs!
end

I think we use AD also directly form third-party packages like ADNLPModels.jl:

https://github.com/control-toolbox/CTDirect.jl/blob/60edc0c8be071bba860db12c768f46f29e482592/src/solve.jl#L40

    # call NLP problem constructor
    docp.nlp = ADNLPModel!(x -> DOCP_objective(x, docp), 
                    x0,
                    docp.var_l, docp.var_u, 
                    (c, x) -> DOCP_constraints!(c, x, docp), 
                    docp.con_l, docp.con_u, 
                    backend = :optimized)

gdalle · 2024-05-29T07:08:30Z

Thanks for the links, I'll take a look but I already have a few questions.

Why do you call the derivative the gradient? What is this ctNumber that you use?

Are the derivative and Jacobian the only operators you need? What are the typical input and output dimensionalities for the Jacobian? Depending on the answer, you may want to parametrize with different AD backends for the derivative (forward mode always) and the Jacobian (forward mode for large input and small output, reverse mode for small input and large output, otherwise unclear).

Do you take derivatives or Jacobians of the same function several times, but with different input vectors? If so, you will hugely benefit from a preparation mechanism like the one that is implemented in DifferentiationInterface.

gdalle · 2024-05-29T07:09:21Z

As for ADNLPModels, they are also considering a switch to DifferentiationInterface but it might be slightly slower

jbcaillau · 2024-05-29T22:15:32Z

@gdalle Thanks for the PR and comments

Why do you call the derivative the gradient? What is this ctNumber that you use?

ctNumber = Real. We want to deal more or less uniformly with reals and one dimensional vectors, that is why the special case when the variable is a single real is explicitly dealt with.

Are the derivative and Jacobian the only operators you need? What are the typical input and output dimensionalities for the Jacobian? Depending on the answer, you may want to parametrize with different AD backends for the derivative (forward mode always) and the Jacobian (forward mode for large input and small output, reverse mode for small input and large output, otherwise unclear).

Dimensions < 1e2, e.g. to build the right hand side of a Hamiltonian system.

Do you take derivatives or Jacobians of the same function several times, but with different input vectors? If so, you will hugely benefit from a preparation mechanism like the one that is implemented in DifferentiationInterface.

✅ to be tested elsewhere (see also this comment)

ocots · 2024-06-15T21:16:17Z

I think that for CTBase.jl, a step has been done. Do we close this issue? We will see next in CTFlows.jl how to handle this.

gdalle · 2024-06-17T17:07:38Z

To me this is not yet done, because #141 added a backend kwargs to ctgradient and the like, but this kwarg is not passed from further up the chain. As a result, users cannot change the AD backend, even though package developers can through the __auto() function.

jbcaillau · 2024-06-20T13:23:50Z

@gdalle check this PR

gdalle · 2024-06-21T06:04:50Z

To clarify, even with this PR, you're currently doing something like

function solve_control_problem(f)
    # ...
    for i in 1:n
        x -= gradient(f, x)
    end
    # ...
end

function gradient(f, x, backend=default_backend())
    # ...
end

And for users who only care about high-level interfaces, and who never call gradient directly, the following seems better to me:

function solve_control_problem(f, backend)
    # ...
    for i in 1:n
        x -= gradient(f, x, backend)
    end
    # ...
end

function gradient(f, x, backend=default_backend())
    # ...
end

But you know best if that's relevant in your case or not.

ocots · 2024-06-21T08:55:55Z

We totally agree with you. The second choice is better.

But, actually the function to solve optimal control problems is not in the CTBase.jl package.

Besides, our function

function gradient(f, x, backend=default_backend())
    # ...
end

is not used in the resolution function of optimal control problems. It is used for instance in the package CTFlows.jl here. I agree that here I will have to add a kwarg for the AD backend.

About the resolution of the optimal control problems, we pass through ADNLPModels.jl and again we want the user to have the possibility to choose the AD backend.

jbcaillau · 2024-06-21T10:01:30Z

@gdalle agreed, thanks for the feedback. actually, there is now a setter that allows user / dev to change the backend (globally and dynamically); it is also easy to add optional kwarg to allow this anywhere it makes sense (solvers, etc.) We leave this issue open for further testing, e.g. for cases requiring a change of backend between first order derivative computation and second order ones.

On a side note: check this upcoming talk at JuliaCon 2024 (we'll also be around)

gdalle · 2024-06-21T15:17:45Z

Thanks for pointing out ADOLC.jl, we're already on the ball ;) see TimSiebert1/ADOLC.jl#7 to track progress

jbcaillau transferred this issue from control-toolbox/OptimalControl.jl Mar 28, 2023

gdalle mentioned this issue May 28, 2024

Possible direct users JuliaDiff/DifferentiationInterface.jl#134

Closed

gdalle mentioned this issue May 29, 2024

Start using DifferentiationInterface #140

Merged

ocots mentioned this issue Jun 15, 2024

Go to v0.10.0 #158

Closed

13 tasks

ocots mentioned this issue Jun 20, 2024

Add change of default AD backend #160

Merged

ocots added the enhancement New feature or request label Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD: change the backend transparently #25

AD: change the backend transparently #25

jbcaillau commented Feb 9, 2023 •

edited

Loading

ocots commented Mar 28, 2023 •

edited

Loading

jbcaillau commented Mar 28, 2023

ocots commented May 9, 2023

jbcaillau commented May 10, 2023

jbcaillau commented Jul 18, 2023

jbcaillau commented Oct 23, 2023

ocots commented May 7, 2024

gdalle commented May 28, 2024

ocots commented May 29, 2024 •

edited

Loading

gdalle commented May 29, 2024 •

edited

Loading

gdalle commented May 29, 2024

jbcaillau commented May 29, 2024 •

edited

Loading

ocots commented Jun 15, 2024

gdalle commented Jun 17, 2024

jbcaillau commented Jun 20, 2024

gdalle commented Jun 21, 2024

ocots commented Jun 21, 2024

jbcaillau commented Jun 21, 2024

gdalle commented Jun 21, 2024

AD: change the backend transparently #25

AD: change the backend transparently #25

Comments

jbcaillau commented Feb 9, 2023 • edited Loading

ocots commented Mar 28, 2023 • edited Loading

jbcaillau commented Mar 28, 2023

ocots commented May 9, 2023

jbcaillau commented May 10, 2023

jbcaillau commented Jul 18, 2023

jbcaillau commented Oct 23, 2023

ocots commented May 7, 2024

gdalle commented May 28, 2024

ocots commented May 29, 2024 • edited Loading

gdalle commented May 29, 2024 • edited Loading

gdalle commented May 29, 2024

jbcaillau commented May 29, 2024 • edited Loading

ocots commented Jun 15, 2024

gdalle commented Jun 17, 2024

jbcaillau commented Jun 20, 2024

gdalle commented Jun 21, 2024

ocots commented Jun 21, 2024

jbcaillau commented Jun 21, 2024

gdalle commented Jun 21, 2024

jbcaillau commented Feb 9, 2023 •

edited

Loading

ocots commented Mar 28, 2023 •

edited

Loading

ocots commented May 29, 2024 •

edited

Loading

gdalle commented May 29, 2024 •

edited

Loading

jbcaillau commented May 29, 2024 •

edited

Loading