Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

norm at zero #538

Closed
mcabbott opened this issue Oct 8, 2021 · 3 comments
Closed

norm at zero #538

mcabbott opened this issue Oct 8, 2021 · 3 comments

Comments

@mcabbott
Copy link
Member

mcabbott commented Oct 8, 2021

From JuliaDiff/ForwardDiff.jl#547, note that the rule for norm gives zero gradient at x=0. It might be preferable to pick something like a sub-gradient?

julia> using Zygote, ForwardDiff, LinearAlgebra

julia> for g in [Zygote.gradient, ForwardDiff.gradient]
       @show g
       for f in [norm, x -> sqrt(sum(abs2, x))]
         @show f
         @show g(f, [eps(),0])
         @show g(f, [0,eps()])
         @show g(f, [0,0])
       end
       end
g = Zygote.gradient
f = LinearAlgebra.norm
g(f, [eps(), 0]) = ([1.0, 0.0],)
g(f, [0, eps()]) = ([0.0, 1.0],)
g(f, [0, 0]) = ([0.0, 0.0],)   # rule from ChainRules
f = var"#17#18"()
g(f, [eps(), 0]) = ([1.0, 0.0],)
g(f, [0, eps()]) = ([0.0, 1.0],)
g(f, [0, 0]) = ([NaN, NaN],)   # with hand-written norm, 0/0
g = ForwardDiff.gradient
f = LinearAlgebra.norm
g(f, [eps(), 0]) = [1.0, 0.0]
g(f, [0, eps()]) = [0.0, 1.0]
g(f, [0, 0]) = [0.0, 1.0]      # this picks a sub-gradient?
f = var"#17#18"()
g(f, [eps(), 0]) = [1.0, 0.0]
g(f, [0, eps()]) = [0.0, 1.0]
g(f, [0, 0]) = [NaN, NaN]
@mcabbott mcabbott transferred this issue from JuliaDiff/ChainRulesCore.jl Oct 8, 2021
@oxinabox
Copy link
Member

[0.0, 0.0] seems right to me; but maybe i am missing something important.
Breaking symmetry and choosing either [1.0, 0.0] or [0.0, 1.0] seems icky.
I guess we could do fill(inv(sqrt(length(x))), length(x)), though that also is a arbitrary choice of perturbing off on a "positive diagonal"

@sethaxen
Copy link
Member

Seems to be norm would often be used in an optimization problem, where the optimum would be achieved when norm(...) == 0, so the [0,0] gradient makes sense to me. The only other way I can thinking of how one would get exactly a 0-norm is if one initialized points such that exactly a 0-norm was formed, which doesn't seem like our problem.

@mcabbott
Copy link
Member Author

The concern would be if x==[0,0] wasn't the optimum, then you could get stuck there. And you needn't initialise there, you could for instance be adding some noise & restricting, like x_next = clamp.(x .+ randn.()./100, 0, 1).

Mathematically the answer will depend on what direction you approach this point from. Which could lead you to argue that no limit exists, and the right answer is then NaN. But for optimisation, probably it's better to pick one?

That said, this hasn't bitten me, but it came up in the linked ForwardDiff issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants