Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractArray Dilemma docs #347

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

willtebbutt
Copy link
Member

@willtebbutt willtebbutt commented May 6, 2021

This is probably going to need to go through a few rounds of redrafting before it's good to go.

I've tried to produce a statement of the facts, without imposing my opinion on the right solution. Inevitably, I won't have achieved this, and I'm very happy to rephrase things that aren't sufficiently unbiased.

@mcabbott @mzgubic @oxinabox

edit:

Things to include in the next pass:

  1. Related issues, including Inequality, ε-balls, and accidental structural zeros ForwardDiff.jl#480, Type Constraints ChainRules.jl#232, Improvements to rules for norm ChainRules.jl#337, Explain the Abstract Primals Problem #343 -- there are more
  2. Various typo / grammar fixes
  3. More discussion of the relationship with the forwards pass, and the relationship with generic fallbacks.
  4. Make it clearer that this is a discussion about semantics -- we need to agree on what it is that we want to implement before we can implement it.
    1. Related PRs: Take nothing seriously FluxML/Zygote.jl#967

@codecov-commenter
Copy link

codecov-commenter commented May 6, 2021

Codecov Report

Merging #347 (a6465bb) into master (5f1613f) will decrease coverage by 1.15%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #347      +/-   ##
==========================================
- Coverage   90.01%   88.86%   -1.16%     
==========================================
  Files          14       13       -1     
  Lines         541      530      -11     
==========================================
- Hits          487      471      -16     
- Misses         54       59       +5     
Impacted Files Coverage Δ
src/compat.jl 0.00% <0.00%> (-55.56%) ⬇️
src/ruleset_loading.jl 93.75% <0.00%> (-2.09%) ⬇️
src/differentials/composite.jl 82.75% <0.00%> (-0.30%) ⬇️
src/ChainRulesCore.jl
src/rule_definition_tools.jl 97.52% <0.00%> (+0.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5f1613f...a6465bb. Read the comment docs.

@willtebbutt
Copy link
Member Author

Random though: I'm currently chopping and changing between "structured array type" and "sparse array type" quite a bit.
I'll probably settle on "structured array type" after a round of review, since things like low-rank matrices are dense, but are also the kind of thing that we're interested in here.

@mcabbott
Copy link
Member

mcabbott commented May 6, 2021

Thanks for making a start.

My essay on this is JuliaDiff/ChainRules.jl#337 (comment), and I'd appreciate your comments on what, if any, of that you disagree with.

I wonder, on first skim, whether the parallel to the forward evaluation should be stressed more. On a generic AbstractMatrix, * calls generic_matmatmul!, which is awful in (I think) precisely the ways you complain about. The reason it works as your example is that these structured/sparse types have defined their own specialisation for the forward *, one by one. They are not automatically efficient with all functions (in fact, I argue that they opt-in to having inefficient fallbacks outside their domain of interest) but are specifically designed to be efficient with this one.

Re names, Symmetric is structured but not sparse. Diagonal counts as sparse (and e.g. similar sometimes treats it as such) but IMO UpperTriangular probably doesn't. Perhaps for these arguments, it's worth retaining a distinction -- the asymptotic arguments don't apply at all to Symmetric.

@willtebbutt
Copy link
Member Author

My essay on this is JuliaDiff/ChainRules.jl#337 (comment), and I'd appreciate your comments on what, if any, of that you disagree with.

It's the next thing on my to-read list :) Wanted to get my thoughts down before working through it.

Re names, Symmetric is structured but not sparse. Diagonal counts as sparse (and e.g. similar sometimes treats it as such) but IMO UpperTriangular probably doesn't.

Agreed -- UpperTriangular{<:Real, Matrix{<:Real}} doesn't feel like something we're talking about here from a performance perspective, but something like UpperTriangular{<:Real, <:SparseMatrixCSC{<:Real}} does.
Either way, need to be careful about definitions.

I wonder, on first skim, whether the parallel to the forward should be stressed more. On a generic AbstractMatrix, * calls generic_matmatmul!, which is awful in (I think) precisely the ways you complain about. The reason it works as your example is that these structured/sparse types have defined their own specialisation for the forward *, one by one. They are not automatically efficient with all functions (in fact, I argue that they opt-in to having inefficient fallbacks outside their domain of interest) but are specifically designed to be efficient with this one.

I'm not quite sure what you're saying here. When you say "the parallel to the forward" are you referring to the usual function evaluation, or the forwards pass of reverse-mode? I guess I'm not quite sure what you have in mind here.

@mcabbott
Copy link
Member

mcabbott commented May 6, 2021

I mean "usual function evaluation", which is also usually the forward pass. The sparse matrix types you describe all know about *, they are specifically designed to work with that (and some other things). They probably don't know about cumsum or mapslices or writedlm or prod or (apparently) norm, but like good citizens, they have voluntarily signed up to receive the slow but workable AbstractArray implementations outside their speciality.

@willtebbutt
Copy link
Member Author

Ahh I see. Yes, we should talk about that more.

My position here is that all I want from AD is for it to be efficient if I've bothered to write the code to specialise the function, and I've written it in such a way that is "AD friendly". I'm perfectly happy for AD to produce slow rules if the function itself hits a slow fallback.

I'm also happy to write rules which deal with AD-unfriendly, but specialised code. We have to do this quite a bit with Diagonal matrices, because the implementations in base often use for-loops and mutation, which reverse-mode AD doesn't typically love.

I'm happy to opt-in to generic rrules by declaring "this array is the kind of array for which generic rules are likely to be a good idea", but I really don't want to have to opt out of them, and I really don't want someone to write a type-pirate-y rule that gets in the way of AD doing it's job when I load a particular package.

@nickrobinson251 nickrobinson251 added the documentation Improvements or additions to documentation label May 7, 2021
Copy link
Member

@mzgubic mzgubic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few typos

If the semantics of AD are such that it is necessary to assume that the tangent of `D`
might be non-diagonal, it is necessary for AD to treat `D` like a `Matrix` which happens to
be diagonal, as discussed above.
This code will clearly not run if that is the case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This code will clearly not run if that is the case.
This code will clearly not run if that is the case, because N is so large.

this only became clear to me after reading the next paragraph, I didn't really internalise the size of N

This property seems to arise from AD considering only considering a function at a single
point, rather than the neighbourhood of a point.

We've not used any contraversial `frule`s here, so this property must be something to do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We've not used any contraversial `frule`s here, so this property must be something to do
We've not used any controversial `frule`s here, so this property must be something to do

end
```
When the input `A` is a `Matrix{Float64}` and the cotangent `ȳ` a `Vector{Float64}`, it
seems uncontraversial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
seems uncontraversial.
seems uncontroversial.


### Takeaway

There are a at least two possible things you could think to do as a consequence of this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are a at least two possible things you could think to do as a consequence of this.
There are at least two possible things you could think to do as a consequence of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants