-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NaN
s in jacobian of a function that uses a StaticArray
and norm
#547
Comments
I have read the docs section Fixing NaN/Inf Issues under User Documentation > Advanced Usage Guide. Indeed, if I set the If julia> ForwardDiff.jacobian(foo1!, out, x)
4×12 Matrix{Float64}:
1.0 0.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.565685
0.0 1.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.565685
-1.0 0.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.0 0.565685
0.0 -1.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.0 0.565685 If julia> ForwardDiff.jacobian(foo1!, out, x)
4×12 Matrix{Float64}:
1.0 0.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 1.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
-1.0 0.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 -1.0 0.565685 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 I would appreciate some guidance on what I should do. I.e., should I define |
Regardless of |
Hi, @KristofferC. Thank you for your reply. You mean in ForwardDiff.jl? Where should I start? 😅 |
With StaticArrays 0.12.4 I get the same result between static matrix and matrix running the example in the first post. With StaticArrays 1.2.13 I get your result. So something happened there. Bisecting points to JuliaArrays/StaticArrays.jl#908 as the offending PR. |
A more minimal example is julia> using StaticArrays, ForwardDiff
julia> ForwardDiff.gradient(norm, [0,0])
2-element Vector{Float64}:
0.0
1.0
julia> ForwardDiff.gradient(norm, SA[0,0])
2-element SVector{2, Float64} with indices SOneTo(2):
NaN
NaN With a hand-written julia> ForwardDiff.gradient(x -> sqrt(sum(abs2, x)), [0,0])
2-element Vector{Float64}:
NaN
NaN
julia> ForwardDiff.gradient(x -> sqrt(sum(x .^ 2)), [0, eps()])
2-element Vector{Float64}:
0.0
1.0
julia> ForwardDiff.gradient(x -> sqrt(sum(x .^ 2)), [eps(), 0])
2-element Vector{Float64}:
1.0
0.0 Note also that because it's caused by an explicit branch (at least according to JuliaArrays/StaticArrays.jl#964, didn't check LinearAlgebra's source) it will be changed by #481, to never take the measure-zero branch, and hence always give Zygote's behaviour (using a rule for julia> Zygote.gradient(x -> sqrt(sum(abs2, x)), [0,0])
([NaN, NaN],)
julia> Zygote.gradient(norm, [0,0])
([0.0, 0.0],) That could choose to pick a particular "sub-gradient". But I'm not sure that doing that with DiffRules would work, since the function takes a vector not a scalar. |
It seems this particular issue is due to the derivative for julia> ForwardDiff.gradient(x -> sqrt(sum(abs2, x)), [0,0]) # bad
2-element Vector{Float64}:
NaN
NaN
julia> ForwardDiff.gradient(x -> sum(abs2, x), [0,0]) # removing the sqrt, fine
2-element Vector{Int64}:
0
0
julia> sqrt(ForwardDiff.Dual(0.0, 0.0)) # Uh-oh! should be zero!
Dual{Nothing}(0.0,NaN)
julia> ForwardDiff.Dual(0.0, 0.0)^0.5 # yes, pow has the same issue
Dual{Nothing}(0.0,NaN) I think the right fix here is to bypass DiffRules' rules for See also JuliaDiff/ChainRules.jl#576 |
Oh right, well spotted. The dual part is indeed zero: julia> sqrt_show(x) = sqrt(@show x);
julia> ForwardDiff.gradient(x -> sqrt_show(sum(abs2, x)), [0,0])
x = Dual{ForwardDiff.Tag{var"#5#6", Int64}}(0,0,0)
2-element Vector{Float64}:
NaN
NaN Catching that and replacing it by zero can avoid the NaN, but seems slightly weird in that it doesn't pick a subgradient (if that's the term). While We can picture this (for a 2-vector) as being the tip of a cone, a rolled up piece of paper cone. The current behaviour is to say that the tip is a singularity, hence has no gradient. The sqrt proposal would smooth the tip off, so that microscopic angels can treat it as a dance floor. If you could always pick Maybe that's not so strange then. It's a bit like |
FWIW, finite differences finds that the gradient of julia> using FiniteDifferences, LinearAlgebra
julia> FiniteDifferences.grad(central_fdm(5, 1), norm, [0.0,0.0])
([-3.8050255348364236e-17, -3.8050255348364236e-17],) So I would be inclined to side with CR here. But I think either way, the |
Finite differencing just averages the two sides of |
As a side remark: As noted above, everything works if one uses the NaN-safe setting (https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Fixing-NaN/Inf-Issues). I think one should benchmark if and to what extent performance is affected by making NaN-safe the default. I think it's weird to report incorrect results with the default setting. |
Great. I see that the implementation is here: ForwardDiff.jl/src/partials.jl Lines 93 to 121 in 43ef860
Is it obvious why this test |
Hi! I am having trouble differentiating a function with ForwardDiff.jl. I don't know why, but the resulting jacobian contains
NaN
s. Below is a MWE to kickstart the discussion.Consider the following functions:
Let's try out
foo1!
:foo1!
is type-stable and does not perform dynamic allocations:We can use ForwardDiff.jl to compute the jacobian of
foo1!
:However, if the inputs are all zeros, the jacobian will contain
NaN
s:But if we change the type of the matrix
A_λ
(infoo1!
) from anSMatrix
to a normalMatrix
, the jacobian will be evaluated properly:Moreover, I have also observed that the jacobian will not contain
NaN
s if we use the 1-norm or the Inf-norm, even if we keepA_λ
as anSMatrix
, i.e.,or
I am not sure if this is a bug or if I am doing something wrong... Can someone help me figure it out? Thank you in advance!
The text was updated successfully, but these errors were encountered: