Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: avoid double function call in ReverseDiff value_and_gradient #729

Merged
merged 5 commits into from
Feb 16, 2025

Conversation

gdalle
Copy link
Member

@gdalle gdalle commented Feb 14, 2025

Personal musings: The behavior of ReverseDiff is very confusing, even on ImmutableDiffResult. I encountered a Heisenbug which seems to depend on the compilation path (disappears when I add a print statement), where value_and_gradient! suddenly becomes incorrect when called inside value_gradient_and_hessian!. But only for a compiled tape. And the :gradient tests for value_and_gradient! still pass. What a mess.

Copy link

codecov bot commented Feb 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.92%. Comparing base (98e6e5f) to head (e59205d).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #729      +/-   ##
==========================================
- Coverage   97.93%   97.92%   -0.01%     
==========================================
  Files         122      122              
  Lines        6386     6372      -14     
==========================================
- Hits         6254     6240      -14     
  Misses        132      132              
Flag Coverage Δ
DI 98.96% <100.00%> (-0.01%) ⬇️
DIT 95.71% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gdalle gdalle marked this pull request as draft February 14, 2025 21:16
@gdalle
Copy link
Member Author

gdalle commented Feb 14, 2025

So, in the current state of the PR, value_and_gradient! works but not alongside a Hessian computation... unless we print something? I'm going crazy.

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
(0.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
(0.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

julia> using DifferentiationInterface, FiniteDiff, ReverseDiff

julia> DifferentiationInterface.value_and_gradient!(
           f,
           zeros(2),
           AutoReverseDiff(; compile=true),
           ones(2),
       )
(2.0, [2.0, 2.0])

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
(0.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

julia> # now we just add a log

julia> DifferentiationInterface.value_and_gradient!(
           f,
           zeros(2),
           AutoReverseDiff(; compile=true),
           ones(2),
       )
[ Info: I'm here
(2.0, [2.0, 2.0])

julia> DifferentiationInterface.value_gradient_and_hessian!(
           f,
           zeros(2),
           zeros(2, 2),
           SecondOrder(AutoFiniteDiff(), AutoReverseDiff(; compile=true)),
           ones(2),
       )
[ Info: I'm here
(2.0, [2.0, 2.0], [2.0 0.0; 0.0 2.0])

@gdalle
Copy link
Member Author

gdalle commented Feb 14, 2025

I further boiled it down to whether I do y, grad or y, _ in the assignment. Still have no clue what's going on.

EDIT: pure-ReverseDiff example available at JuliaDiff/ReverseDiff.jl#269

using DifferentiationInterface  # version from this branch
import DifferentiationInterface as DI
using ReverseDiff: ReverseDiff

backend = AutoReverseDiff(; compile=true)
f(x) = sum(abs2, x)
x = ones(2)

prep = prepare_gradient(f, backend, zero(x))

function value_and_gradient_nested!(f, grad, prep, backend, x)
    y, _ = value_and_gradient!(f, grad, prep, backend, x)
    return y, grad
end
julia> value_and_gradient!(f, zeros(2), prep, backend, x)
(2.0, [2.0, 2.0])

julia> value_and_gradient_nested!(f, zeros(2), prep, backend, x)  # wrong
(0.0, [2.0, 2.0])

This behavior only happens with the compiled tape mode of ReverseDiff. I think it might be because the compiler struggles to figure out that ReverseDiff mutates the value of y in the MutableDiffResult?

@gdalle gdalle marked this pull request as ready for review February 15, 2025 11:57
@gdalle gdalle merged commit 3417ff5 into main Feb 16, 2025
72 checks passed
@gdalle gdalle deleted the gd/rev branch February 16, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant