Improvements to rules for `norm` #337

mcabbott · 2020-12-26T17:47:23Z

This fixes FluxML/Zygote.jl#860 by relaxing the signature on the rule for norm. While there is no longer an explicit mention of Diagonal in the rule, the tests still check that this is correctly reproduced.

It should also fix gradient(norm, [1, 2]), which gave an InexactError.

And adds a few more InplaceableThunks while I was at it.

Edit -- fixes FluxML/Zygote.jl#960, too.

codecov-io · 2020-12-26T18:13:32Z

Codecov Report

Merging #337 (df02563) into master (ebc99f7) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #337      +/-   ##
==========================================
+ Coverage   97.64%   97.67%   +0.03%     
==========================================
  Files          18       18              
  Lines        1018     1034      +16     
==========================================
+ Hits          994     1010      +16     
  Misses         24       24

Impacted Files	Coverage Δ
src/rulesets/LinearAlgebra/norm.jl	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebc99f7...df02563. Read the comment docs.

simeonschaub

This is great work, thanks! I have just some very minor suggestions, but overall LGTM. I think this might even fix some problems I am currently having with Zygote, but need to look into that some more.

src/rulesets/LinearAlgebra/norm.jl

test/rulesets/LinearAlgebra/norm.jl

sethaxen · 2020-12-31T23:58:40Z

We made the restriction of the rrule to StridedArrays and a few others because it (or, really, any rrule that broadcasts over a similar type to the primal) does the wrong thing for large classes of arrays. For example, it will not work for Hermitian matrices. Using this PR:

julia> using LinearAlgebra, ChainRules

julia> A, ȳ = randn(ComplexF64, 5, 5), randn();

julia> hermA, Hermitian_back = rrule(Hermitian, A, :U);

julia> y, norm_back = rrule(norm, hermA);

julia> unthunk(norm_back(ȳ)[2])
ERROR: ArgumentError: Cannot set a non-diagonal index in a Hermitian matrix

This is due to the broadcasting to a similar type to A. Roughly, I would not expect the rule to work correctly for any matrix type that has non-sparsity constraints, and potentially to fail for some that only have sparsity constraints. So dense arrays, Diagonal, and AbstractTriangular are fine, but many other types may not be, and it's better to not have a rule than have one that silently does the wrong thing.

mcabbott · 2021-01-01T22:04:17Z

I would not describe an ArgumentError as being silent, but I agree it's a problem, thanks for the example.

This is precisely because of the work-around for NaN .* Diagonal(rand(3)) (in 1b7307f), so perhaps whatever solves that needs to be applied more narrowly, leaving straight broadcasting as the most generic fallback. I have an idea...

I guess I'm not completely sold that "all is forbidden except what is permitted" is the right policy, compared to Julia's usual AbstractArray flexibility. The motivating issue above involves StaticArrays, which used to work, and will behave well under the generic brodcast etc. of a fairly simple rule like this. (And less well under the similar(x) variant.) It seems very natural and very Julian that they would share code. How would these be made to work under a more restrictive policy?

oxinabox · 2021-01-02T15:14:06Z

The special problem is that there is a class of types which had an overly broad rule not been defined then the AD would have done the right thing, but with it defined the wrong thing happens.

mcabbott · 2021-01-04T21:30:48Z

had an overly broad rule not been defined then the AD would have done the right thing, but with it defined the wrong thing happens.

I'm not sure I follow. Which rule? In FluxML/Zygote.jl#860, there is no rule for norm being used. Are you saying this is caused by some other rule?

julia> Zygote.ChainRules.rrule(norm, [1,2,3.0])
(3.7416573867739413, ChainRules.var"#norm_pullback#1776"{Vector{Float64}, Float64}([1.0, 2.0, 3.0], 3.7416573867739413))

julia> Zygote.ChainRules.rrule(norm, SA[1,2,3.0]) === nothing
true

julia> Zygote.gradient(norm, [1,2,3.0])
([0.2672612419124244, 0.5345224838248488, 0.8017837257372732],)

julia> Zygote.gradient(norm, SA[1,2,3.0])
Internal error: encountered unexpected error in runtime:
BoundsError(a=Array{Any, (2,)}[
  Core.Compiler.VarState(typ=Zygote.Pullback{Tuple{typeof(StaticArrays._norm), StaticArrays.Size{(3,)}, StaticArrays.SArray{Tuple{3}, Float64, 1, 3}}, Any}, undef=false),
...

julia> Zygote.gradient(norm, [1,2,3]) # this one is easy
ERROR: InexactError: Int64(0.2672612419124244)

(Without this PR!)

oxinabox · 2021-01-04T21:39:44Z

 I'm not sure I follow. Which rule? In FluxML/Zygote.jl#860, there is no rule for norm being used. Are you saying this is caused by some other rule

It is a general statement, not specifically related to this PR.

mcabbott · 2021-01-04T21:39:58Z

Maybe this should be a separate message. Another problem with the dx = similar(x) type-stability fix is that it breaks second derivatives, even when everything is an Array. This turned up at FluxML/Zygote.jl#865 (comment), but a more minimal example is:

julia> Zygote.gradient(x -> sum(Zygote.gradient(norm, x)[1]), rand(3))
ERROR: Mutating arrays is not supported
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] (::Zygote.var"#375#376")(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/lib/array.jl:61
  [3] (::Zygote.var"#2258#back#377"{Zygote.var"#375#376"})(Δ::Nothing)
    @ Zygote ~/.julia/packages/ZygoteRules/OjfTt/src/adjoint.jl:59
...
  [7] Pullback
    @ ~/.julia/packages/ChainRules/sGOE6/src/rulesets/LinearAlgebra/norm.jl:216 [inlined]
  [8] (::typeof(∂(_norm2_back)))(Δ::FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}})
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/compiler/interface2.jl:0
  [9] Pullback
    @ ~/.julia/packages/ChainRules/sGOE6/src/rulesets/LinearAlgebra/norm.jl:57 [inlined]
 [10] (::typeof(∂(λ)))(Δ::Tuple{Nothing, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}})
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/compiler/interface2.jl:0

Does this package have opinions about second derivatives, and about tests for them? Seems tricky, and maybe not every rule can support them, but where easy it would be nice to have.

sethaxen · 2021-01-04T21:53:12Z

Does this package have opinions about second derivatives, and about tests for them? Seems tricky, and maybe not every rule can support them, but where easy it would be nice to have.

It really seems that we rarely think about being second-differentiable, which seems like an oversight.

mcabbott · 2021-01-04T22:07:13Z

Latest commit 8ab56db is an idea for fixing NaN .* Diagonal(rand(3)) problem, still using broadcasting not writing explicitly into similar (and hence allowing for 2nd derivatives). It's not the prettiest thing ever... but see what you think?

julia> using ChainRules, LinearAlgebra

julia> ChainRules._norm1_back(Diagonal(rand(3)), 1.2, 3.4)
3×3 Diagonal{Float64, Vector{Float64}}:
 3.4   ⋅    ⋅ 
  ⋅   3.4   ⋅ 
  ⋅    ⋅   3.4

julia> ChainRules._norm1_back(Diagonal(rand(3)), 1.2, NaN)
3×3 Diagonal{Float64, Vector{Float64}}:
 NaN       ⋅      ⋅ 
    ⋅   NaN       ⋅ 
    ⋅      ⋅   NaN

julia> @code_warntype ChainRules._norm1_back(Diagonal(rand(3)), 1.2, NaN)
Variables
  #self#::Core.Const(ChainRules._norm1_back)
  x::Diagonal{Float64, Vector{Float64}}
  y::Float64
  Δy::Float64
  ∂x_data::Vector{Float64}

Body::Diagonal{Float64, Vector{Float64}}
1 ─ %1 = ChainRules.parent(x)::Vector{Float64}
│   %2 = Base.broadcasted(ChainRules.sign, %1)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(sign), Tuple{Vector{Float64}}}
│   %3 = ChainRules.real(Δy)::Float64
│   %4 = Base.broadcasted(ChainRules.:*, %2, %3)::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(*), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(sign), Tuple{Vector{Float64}}}, Float64}}
│        (∂x_data = Base.materialize(%4))
│   %6 = ChainRules.withsomezeros_rewrap(x, ∂x_data)::Diagonal{Float64, Vector{Float64}}
└──      return %6

Hermitian is handled by just broadcasting. And second derivatives may work:

julia> Zygote.gradient(norm, Hermitian(rand(3,3)))[1]
3×3 Matrix{Float64}:
 0.172849  0.319948   0.181923
 0.319948  0.566805   0.0347145
 0.181923  0.0347145  0.612797

julia> Zygote.gradient(x -> sum(Zygote.gradient(norm, x)[1]), rand(3))
([-0.30907017350951627, -0.27754572206132044, 1.6974578767958433],)

julia> Zygote.gradient(x -> sum(Zygote.gradient(norm, x)[1]), Diagonal(rand(3)))  # too much to ask?
ERROR: MethodError: no method matching +(::NamedTuple{(:diag,), Tuple{Vector{Float64}}}, ::Diagonal{Float64, Vector{Float64}})
  ...
Stacktrace:
  [1] accum(x::NamedTuple{(:diag,), Tuple{Vector{Float64}}}, y::Diagonal{Float64, Vector{Float64}})
    @ Zygote ~/.julia/packages/Zygote/ywhiG/src/lib/lib.jl:8
  [2] ZBack
    @ ~/.julia/packages/Zygote/ywhiG/src/compiler/chainrules.jl:77 [inlined]
  [3] Pullback
    @ ~/.julia/dev/ChainRules/src/rulesets/LinearAlgebra/norm.jl:54 [inlined]
  [4] (::typeof(∂(rrule)))(Δ::Tuple{Nothing, NamedTuple{(:x, :y), Tuple{NamedTuple{(:diag,), Tuple{V
...

Edit -- Tests pass locally, but fail on CI, in tests of Adjoint etc. Something is failing to be conjugated? Haven't made this happen locally yet.

oxinabox · 2021-01-04T22:15:55Z

Supporting higher order rules is on the roadmap for ChainRules v2.0
It's not something we are worrying about right now. Burn that bridge when we get to it.

I have head that it is a very common problem for Zygote nested AD not to work to to mutation.
Because people deal with lack of mutation support by mutating and hinding the result behind the rule.
The real solution is long term Zygote must support mutation, there really isn't any other reasonable option.
Practically, the second derviatives that are actually wanted are almost always best calculated via forward over forward or forward over reverse, and no forward AD has mutation trouble.
We can fix things when and if they come up, but I wouldn't spend too much time worrying about it til it does.
Not yet. Other fish to cross first

MethodError: no method matching +(::NamedTuple{(:diag,), Tuple{Vector{Float64}}}, ::Diagonal{Float64, Vector{Float64}}

That one will be fixed one Zygote is switched over to using Composite internally, rather than NamedTuple.

src/rulesets/LinearAlgebra/norm.jl

oxinabox · 2021-01-11T14:26:23Z

src/rulesets/LinearAlgebra/norm.jl

        ∂x = Thunk() do
            return if isempty(x) || p == 0
-                zero.(x) .* (zero(y) * zero(real(Δy)))
+                InplaceableThunk(
+                    @thunk(zero.(x) .* (zero(y) * zero(real(Δy)))),


You know what is fun? this is broken for some arrays.
We might want to constrain the eltype to be Number

Consider:

julia> norm([[[1]]], 2) 1.0 julia> zero.([[[1]]]) ERROR: MethodError: no method matching zero(::Type{Vector{Int64}})

because of JuliaLang/julia#38064

Not blocking for this PR

This entire file should probably be restricted to arrays of <:Number?

Done in be61a4e, but not 100% sure that's a good idea -- haven't tried to audit which rules would or would not work for arrays of arrays.

oxinabox · 2021-04-29T15:22:22Z

src/rulesets/LinearAlgebra/utils.jl

+`λ .* Diagonal(rand(3))` gives a dense matrix when `x==Inf`.
+But `withsomezeros_rewrap(x, λ .* parent(x))` is type-stable.
+"""
+WithSomeZeros{T} = Union{


I would call these StructuredSparseArray

You approve of the mechanism, #337 (comment)?

I am willing to give it a shot.
We can always change it later.
It's not going to lead to wrong behavour AFAICT.

It seems unfortunate not to take advantage of the fact that we know where the zeros are,
and we know that the pullback is going to map zeros to zeros, since linear.
So we should be able to skip some.
But idk that that is a generic API for our structurally sparse matrixes to know if an index will be zero.

Maybe I misunderstand you, but both λ .* Diagonal(rand(3)) and this function do know where the zeros are, and do O(N) work. That's the only really sparse one.

For UpperTriangular, I haven't tried to time this against broadcasting... there could be trade-offs, maybe broadcasting skips half, but if so it needs lots of if statements. Frankly I doubt that anyone has ever called norm(::UpperTriangular) outside a test, though. So perhaps thinking about that can wait until this finds wider use where someone does need to care.

It would also be good to fix this instability upstream. Can't we argue that the off-diagonal elements are a strong zero like false, and make NaN .* Diagonal(rand(3)) just work? Is there an issue?

I feel like all structural zeros should be strong yes.
I was sure I had seem julia displaying that behavour on SparseCSC matrixes, but I can't reproduce it right now.

oxinabox · 2021-04-29T15:49:53Z

I wonder if we should just add StaticArrays as a dependency (it is a super popular package),
and then have a const PrimativeArray{T,N} = Union{StridedArray{T, N}, StaticArray{T,N}} or even const PrimativeArray{T,N} = Union{Array{T, N}, StaticArray{T,N}}
and that will break the current in-pass on.

StaticArray (like Array) is a fundermental array type that doesn't wrap some other type, and isn't really better representated as a struct or tuple either.

GPUArrays are also primative arrays in this sense I think?
But maybe can worry about that later

@willtebbutt

test/rulesets/LinearAlgebra/norm.jl

src/rulesets/LinearAlgebra/norm.jl

codecov-commenter · 2021-04-29T22:24:18Z

Codecov Report

Merging #337 (87e4313) into master (987ee45) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #337      +/-   ##
==========================================
+ Coverage   98.46%   98.49%   +0.02%     
==========================================
  Files          23       23              
  Lines        1893     1929      +36     
==========================================
+ Hits         1864     1900      +36     
  Misses         29       29

Impacted Files	Coverage Δ
src/rulesets/LinearAlgebra/norm.jl	`100.00% <100.00%> (ø)`
src/rulesets/LinearAlgebra/utils.jl	`88.88% <100.00%> (+0.65%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 987ee45...87e4313. Read the comment docs.

mcabbott · 2021-05-04T16:35:23Z

Maybe isn't a bad place to discuss this business of whether to write rules for AbstractArrays, or just Array. I think we may be talking past each other by generalising from experiences with different examples, without laying out what the assumptions are. I'm not sure the case for such restrictions has been made in any detail anywhere, but perhaps someone would care to make it here. The case against:

Consider the following arrays, all AbstractMatrix:

begin
    n = 100
    M = rand(n,n)
    U = UpperTriangular(M)
    D = Diagonal(rand(n))
    P = PermutedDimsArray(M, (2,1))
end;

The function norm treats all of these wrappers as unknown generic arrays: it calls generic_norm2, which is pure Julia and calls getindex, which then acts on the underlying Array (which I think written in C). So in principle a rule for getindex(::Array, ::Int...) alone would allow reverse-mode AD to produce correct answers.

This is not true for norm(M), which calls BLAS.nrm2(M). So you are always going to need a rule for that. One extreme position would be to write only such rules, only for ccalls.

But another position is that providing high-level rules allows more efficient code. Here's the present behaviour -- I guess you can blame getindex for this but anyway, being 3, or maybe 5, orders of magnitude slower if you miss the rule seems hard to ignore:

julia> @btime gradient(LinearAlgebra.generic_norm2, $M);
  149.084 ms (736258 allocations: 1.51 GiB)   # Zygote
  163.940 ms (796266 allocations: 788.73 MiB) # ... with one-hot getindex
  76.235 ms (270010 allocations: 772.17 MiB)  # Tracker

julia> @btime gradient(norm, $M);
  13.666 μs (2 allocations: 78.20 KiB)    # Zygote -- 1000x faster
  138.709 μs (91 allocations: 627.80 KiB) # Tracker

julia> @btime norm($M);  # subtract the forward pass and it's 10^5 times 
  12.416 μs (0 allocations: 0 bytes)

julia> @btime LinearAlgebra.generic_norm2($M);
  27.958 μs (0 allocations: 0 bytes)

julia> @btime norm($U);
  28.833 μs (0 allocations: 0 bytes)

Note that norm(U) does not unwrap and call norm(U.data), because that would be wrong. Even norm(M') does not unwrap, that's an optimisation nobody got around to adding -- this is the vector norm of a matrix, perhaps rarely used. There are many fallback functions like this.

In fact I would argue that such fallbacks are the reason UpperTriangular <: AbstractMatrix, and so on. The wrapper's main function is to be a flag by which LinearAlgebra routines can signal to each other, tidier than 'U'::Char, but this does not need an AbstractMatrix. The supertype is an opt-in to all sorts of other useful behaviour -- it favours making stuff work, over only getting optimal routines. This is even stronger for Diagonal <: AbstractMatrix, which explicitly opts in to lots of O(n^2) behavoiur, while still being mathematically correct. Using Diagonal not diagm means you will hit some faster routines, but Julia provides no guarantee. If you want to be completely sure, you should handle the Vector yourself (or define a StrictDiagonal type without getindex, perhaps).

Diagonal is pretty unusual in having different complexity. It is special-cased here (along with some friends) only to avoid a strange type-instability. Note again that the forward pass norm(D) is not specialised, and is quite a bit slower than the dense matrix:

julia> @btime gradient(norm, $D);
  32.041 μs (1 allocation: 896 bytes)       # Zygote, with this PR (or master)
  32.083 μs (4 allocations: 944 bytes)      # Zygote, without the special case
  124.333 μs (176 allocations: 487.84 KiB)  # Tracker, generic rule?

julia> @btime norm($D);  # forward pass dominates!
  32.000 μs (0 allocations: 0 bytes)

Ignoring this type instability issue, the generic rule already produces a Diagonal gradient, because the broadcasting it uses specialises. By being generic we are, almost accidentally, already doing better than the original function. If this failed, and the generic routine produced an O(n^2) rule where O(n) would have been possible, the asymptotic regime is still far away: n >> 10^5 is an 80GB matrix. And, adding a special case for Diagonal is also entirely OK -- especially if the original forward function has such a special case.

(I think we should also add an overall projection step which ensures the preservation of structured types such as Diagonal. Doing this would mean that, even if you hit a generic function at one step, it does not propagate backwards through other steps. I think there's broad agreement on this, and it's nearly orthogonal to the present discussion.)

There isn't a special case for SMatrix, nor CuArray, but since the gradient is just made of other array operations, and those all specialise, these work well. This seems to be precisely the kind of "unrelated packages working well together by dispatch" which is the advert for Julia.

julia> @btime gradient(norm, $(@SMatrix(rand(10,10)))); 
  126.949 ns (0 allocations: 0 bytes)  # Zygote -- 60x faster than Array

julia> @btime gradient(LinearAlgebra.generic_norm2, $(@SMatrix(rand(10,10))));
  601.334 μs (8955 allocations: 707.45 KiB) # Zygote -- 4700x slower

julia> @btime norm($(@SMatrix(rand(10,10)))); # uses its own routine
  38.642 ns (0 allocations: 0 bytes)

julia> @btime LinearAlgebra.generic_norm2($(@SMatrix(rand(10,10))));
  203.062 ns (0 allocations: 0 bytes)  # naiive code is 5x slower, same on Array

At present Zygote can't differentiate StaticArrays.norm, so perhaps timing generic_norm2 here is unfair? But still, the 5x difference between the naiive code, and whatever special routine StaticArrays is using, is precisely the sort of specialisation which (in other operations) the generic gradient can exploit. Without resorting to ccalls, there are lots of ways to write Julia array code which is much faster than the most naiive loops. And for GPUArrays, the most naiive loops won't work at all. If the rule encodes what we know to be true, mathematically, and acts at a high level, then it leaves room to re-use all these specialisations downstream.

The proposal to consider Union{Array,StaticArray} seems to me a strange one. There are no ccalls in there, it's just getindex on a Tuple ultimately. So the "extreme position" above would ignore it. What would the argument be to treat that at the array level, but leave U (and the adjoint of an SMatrix, etc.) to use scalar code? If this is considered a "fundermental array type", what's the definition, and does a FillArray, or a MappedArray, or a JuliennedArrays.Align, count too? I suppose the position I am arguing is that (say) a ReadonlyMappedArray{Float64,1,<:UnitRange} has opted in to being treated like a vector, at least as the fallback.

Some of my examples here could be evaded by making LinearAlgebra.norm un-wrap the wrappers defined in the same module. But there will never be a shortage of others, for instance P is a lazy wrapper defined outside LinearAlgebra, and there are many more defined in packages (such as OffsetArrays). Making AD work well with as many of them as possible seems like a worthwhile goal.

Being generic does introduce more ways to mess up -- it's obviously easier to test code which only accepts Matrix{Float64} than it is to test code which allows for (say) a structured matrix of unitful dual numbers. This is the cost of working in Julia, and the benefits are why we're all here. (We discussed elsewhere ideas for tooling to make such tests easier. Note that the present rule is so narrowly written that it fails for Matrix{Int}, missed by tests.)

For those proposing much more restricted rules, would you care to argue where you disagree with the above? For norm acting on U, D, P, or just on some other specific wrapper? Or for some other particular function -- perhaps there are classes of functions which ought to be treated differently, maybe lu is one? Can we find the dividing line?

(I think getindex is one, at least getindex(A, i::Int...), as that really does pass through wrappers in a simple way. But perhaps that's its own topic, as it's not a whole-array function like the others here.)

See also:

getindex(::Fill, idx) FluxML/Zygote.jl#863 about getinidex(::Fill)
Type Constraints #232 about projecting onto structured arrays
Support matrix multiplication (Continue #93) JuliaArrays/OffsetArrays.jl#146 discusses unwrapping, or not, for mul!(::OffsetArray)
Test Suite: AbstractArrays and Numbers ChainRulesTestUtils.jl#98 about better testing of rules
Gradient for dot(x,A,y) #261 (comment) for an earlier 52404x speedup example, dot is as generic as norm.

willtebbutt · 2021-05-07T15:31:42Z

You are picturing f(::Diagonal) being an AD-friendly structure-exploiting method, but what also happens all the time is that norm(::Matrix{<:HwFloat}) is a faster but AD-unfriendly implementation, and norm(::CuArray) is something else entirely. Is there an automatic way you could distinguish these?

I would be surprised if there were an automatic way to distinguish these situations.

I think we've narrowed the range of functions we're discussing down a bit now, which is good -- as I understand it we're now just considering whether, for functions for which specialised methods exist for a particular type, it's a better idea to default to utilising generic rrules, or to let AD have a go at differentiating the specialisation directly. I get the feeling that for the former is what you want for the kinds of situations you're considering, and the latter is preferable in the situations that I care about.

So I think we're in the realm of trying to pick a tradeoff. Is this also your understanding?

In terms of giving us nicer options, I wonder whether it's worth enabling two kinds of rrules

our current very aggressive ones that say "I'm sure that I'm the thing you always want to call, never do AD, regardless any specialisations which may exist", and
ones that are less aggressive, as per the proposal above, which express "use me only if there's no more specific code for AD to have a crack at differentiating".

This would give us the option to make quite strong statements where appropriate, and weaker ones where we just want to ensure that the fallback behaviour is reasonable.

mcabbott · 2021-05-07T16:38:30Z

Yes, I think that's a fair statement of what's going on.

If there's no automatic way, then some manual way to mark these distinctions is required.

However, I'm not sure that tagging the original abstract rule is enough, as you suggest with these two classes of rrules. We'd like the generic rule to be employed for norm(::CuArray) and norm(::StaticArray), but not for say norm(::FillArray), assuming that's sufficiently AD friendly to prefer no rule. So the same function has to be in both classes.

(And to make automation harder, the overload is one method down, norm2(a::FillArrays.AbstractFill). It's fast but whether it's AD-able I haven't actually checked. This structure is also very common, of course.)

The other manual way is to opt out. Writing rrule(::typeof(norm), ::AbstractFill) = nothing doesn't in fact work, but my misunderstanding of this perhaps led me to see that it'll be easy to make it work (see edited code 4 messages up). This still seems attractive to me, in that (1) if anyone is writing a specialisation for f and Fill then they certainly have both packages loaded, and (2) if they are doing so while thinking even a little bit about AD, then asking that further add a line like ChainRulesCore.@no_rrule f(::AbstractFill) doesn't seem like an unreasonable burden.

By contrast, making people who don't care about AD have their specialisation norm(::StrideArray{Float32}) opt in (by adding a dependency) seems much harder to sell.

test/rulesets/LinearAlgebra/norm.jl

oxinabox

While we are having useful and converstations in this thread (that i have not yet caught up on)
this code benchmarks well.
and passes tests.
Can we address my last few comments and i will merge it.
If we want to change things later we can.
(nice thing about working in code rather than wood, mistakes can be removed invisably)

Co-authored-by: Simeon Schaub <[email protected]>

Co-authored-by: Lyndon White <[email protected]>

simeonschaub approved these changes Dec 27, 2020

View reviewed changes

simeonschaub requested a review from willtebbutt December 28, 2020 19:52

nickrobinson251 mentioned this pull request Jan 1, 2021

Test Suite: AbstractArrays and Numbers JuliaDiff/ChainRulesTestUtils.jl#98

Open

nickrobinson251 added the type constraints Potentially raises a question about how tightly to constrain argument types for a rule. See #232 label Jan 1, 2021

oxinabox reviewed Jan 11, 2021

View reviewed changes

src/rulesets/LinearAlgebra/norm.jl Outdated Show resolved Hide resolved

oxinabox reviewed Jan 11, 2021

View reviewed changes

src/rulesets/LinearAlgebra/norm.jl Outdated Show resolved Hide resolved

oxinabox reviewed Jan 11, 2021

View reviewed changes

src/rulesets/LinearAlgebra/norm.jl Outdated Show resolved Hide resolved

oxinabox reviewed Jan 11, 2021

View reviewed changes

mcabbott force-pushed the norm branch from 8ab56db to be61a4e Compare April 29, 2021 14:53

oxinabox reviewed Apr 29, 2021

View reviewed changes

mcabbott force-pushed the norm branch 2 times, most recently from cae1531 to 15e55ee Compare April 29, 2021 15:32

oxinabox reviewed Apr 29, 2021

View reviewed changes

test/rulesets/LinearAlgebra/norm.jl Outdated Show resolved Hide resolved

src/rulesets/LinearAlgebra/norm.jl Show resolved Hide resolved

mcabbott force-pushed the norm branch 3 times, most recently from ae62bc0 to 741b141 Compare April 29, 2021 21:33

mcabbott mentioned this pull request May 4, 2021

Explain the Abstract Primals Problem JuliaDiff/ChainRulesCore.jl#343

Open

oxinabox reviewed May 7, 2021

View reviewed changes

test/rulesets/LinearAlgebra/norm.jl Outdated Show resolved Hide resolved

oxinabox approved these changes May 7, 2021

View reviewed changes

Michael Abbott and others added 22 commits May 7, 2021 15:07

widen type of norm pullback, add inplace + tests

19578d6

tidy

9993979

allow, and test, integer x

6b6487d

don't let broadcast allocate ∂x because it's weird

b03b83d

one more

b666c37

Apply suggestions from code review

972b6dd

Co-authored-by: Simeon Schaub <[email protected]>

+ tests

c12b829

a type-stability fix via parent & re-wrap

e4d60f2

comments from review

96eafd6

restrict all arrays to eltype Number

07eb77b

skip UpperHessenberg

d3833fc

version

79ecf73

move branches into InplaceableThunk

f1b5ba4

give up on inplacethunk

3f893e4

try to find out where we are

46bf4c5

don't count the thunks

71d4ac9

rm nested unthunking

e400bd5

make tests pass

81f85b7

comments

43a2756

update notation

0331eda

tidy

69a92dc

integer comment

87e4313

Co-authored-by: Lyndon White <[email protected]>

mcabbott force-pushed the norm branch from 664234e to 87e4313 Compare May 7, 2021 19:08

oxinabox merged commit db07b7a into JuliaDiff:master May 10, 2021

mcabbott deleted the norm branch May 10, 2021 22:49

mzgubic mentioned this pull request Jun 21, 2021

Opting out of rules JuliaDiff/ChainRulesCore.jl#377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to rules for `norm` #337

Improvements to rules for `norm` #337

mcabbott commented Dec 26, 2020 •

edited

Loading

codecov-io commented Dec 26, 2020 •

edited

Loading

simeonschaub left a comment

sethaxen commented Dec 31, 2020

mcabbott commented Jan 1, 2021

oxinabox commented Jan 2, 2021

mcabbott commented Jan 4, 2021 •

edited

Loading

oxinabox commented Jan 4, 2021

mcabbott commented Jan 4, 2021 •

edited

Loading

sethaxen commented Jan 4, 2021

mcabbott commented Jan 4, 2021 •

edited

Loading

oxinabox commented Jan 4, 2021

oxinabox Jan 11, 2021 •

edited

Loading

oxinabox Jan 11, 2021

mcabbott Apr 29, 2021 •

edited

Loading

oxinabox Apr 29, 2021

mcabbott Apr 29, 2021

oxinabox May 4, 2021

mcabbott May 4, 2021 •

edited

Loading

mcabbott May 4, 2021 •

edited

Loading

oxinabox May 6, 2021

oxinabox commented Apr 29, 2021 •

edited

Loading

codecov-commenter commented Apr 29, 2021 •

edited

Loading

mcabbott commented May 4, 2021

willtebbutt commented May 7, 2021

mcabbott commented May 7, 2021

oxinabox left a comment

Improvements to rules for norm #337

Improvements to rules for norm #337

Conversation

mcabbott commented Dec 26, 2020 • edited Loading

codecov-io commented Dec 26, 2020 • edited Loading

Codecov Report

simeonschaub left a comment

Choose a reason for hiding this comment

sethaxen commented Dec 31, 2020

mcabbott commented Jan 1, 2021

oxinabox commented Jan 2, 2021

mcabbott commented Jan 4, 2021 • edited Loading

oxinabox commented Jan 4, 2021

mcabbott commented Jan 4, 2021 • edited Loading

sethaxen commented Jan 4, 2021

mcabbott commented Jan 4, 2021 • edited Loading

oxinabox commented Jan 4, 2021

oxinabox Jan 11, 2021 • edited Loading

Choose a reason for hiding this comment

oxinabox Jan 11, 2021

Choose a reason for hiding this comment

mcabbott Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

oxinabox Apr 29, 2021

Choose a reason for hiding this comment

mcabbott Apr 29, 2021

Choose a reason for hiding this comment

oxinabox May 4, 2021

Choose a reason for hiding this comment

mcabbott May 4, 2021 • edited Loading

Choose a reason for hiding this comment

mcabbott May 4, 2021 • edited Loading

Choose a reason for hiding this comment

oxinabox May 6, 2021

Choose a reason for hiding this comment

oxinabox commented Apr 29, 2021 • edited Loading

codecov-commenter commented Apr 29, 2021 • edited Loading

Codecov Report

mcabbott commented May 4, 2021

willtebbutt commented May 7, 2021

mcabbott commented May 7, 2021

oxinabox left a comment

Choose a reason for hiding this comment

Improvements to rules for `norm` #337

Improvements to rules for `norm` #337

mcabbott commented Dec 26, 2020 •

edited

Loading

codecov-io commented Dec 26, 2020 •

edited

Loading

mcabbott commented Jan 4, 2021 •

edited

Loading

mcabbott commented Jan 4, 2021 •

edited

Loading

mcabbott commented Jan 4, 2021 •

edited

Loading

oxinabox Jan 11, 2021 •

edited

Loading

mcabbott Apr 29, 2021 •

edited

Loading

mcabbott May 4, 2021 •

edited

Loading

mcabbott May 4, 2021 •

edited

Loading

oxinabox commented Apr 29, 2021 •

edited

Loading

codecov-commenter commented Apr 29, 2021 •

edited

Loading