Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: project implementation #380

Closed
wants to merge 6 commits into from
Closed

WIP: project implementation #380

wants to merge 6 commits into from

Conversation

mzgubic
Copy link
Member

@mzgubic mzgubic commented Jun 22, 2021

Builds on top of #306, adds tests and covers some more edge cases.

This is still very preliminary, still need to test how it plays with ChainRules.

To Do:

  • plug in to ChainRules
  • add more structured array types (other than Diagonal) and tests

adding guidance to docs on how to write rules will be done as a part of #376

@CarloLucibello
Copy link

It may be annoying to be forced to use x in a pullback without being able to release it. What if project returns a closure instead, capturing only the relevant information for the projection (e.g. types and shapes):

projx = project(x) # outside the pullback
dx = projx(dx) # inside the pullback

@willtebbutt
Copy link
Member

@CarloLucibello I think @mcabbott had a similar comment on a related PR -- I agree with you both that we should do something like this.

project(::Type{T}, x::T, dx::Complex) where {T<:Real} = T(real(dx))


# Arrays
Copy link
Member

@oxinabox oxinabox Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling we might need to support the same kwargs that similar does.
which iirc are size and eltype.
So you can say this AbstractArray type, except the the size should be this and the eltype that.

E.g.
for primal being WrapperArray{Tuple{Float64,Float64}}

project(typeof(primal), dx; eltype=Tangent{Tuple{Float64,Float64}}

to get back a WrapperArray{Tangent{Tuple{Float64,Float64}}}

And size would let us just store the size, rather than the whole primal type for zeros.

We might need a structual_zeros_from set_structural_zeros_by function or strutural_zero_inds list of indices, for if structural zero locations are only known by value (like for SparseCSC)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see where you're coming from, but is it not both cleaner and more general to do the kind of thing that @CarloLucibello suggests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem orthogonal to me.
Or at least that the closure @CarloLucibello suggest would be implemented on top of this.
So that it extracts these bits of information from the primal and passes them into the kwargs.
Rather than closing over everything

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem orthogonal to me.

Hmmm maybe I've misunderstood your reasoning for needing eltype and size kwargs then.

Are you saying we might need it because we would want to apply project for some primal other than the primal that the tangent is for?

i.e. let x and y be primals of differing sizes. Say that I've got a tangent dx for x, but for some reason I've only got access to y. So I say project(y, dx; size=size_of_x).

Is this the kind of situation you're imagining?

@mcabbott
Copy link
Member

mcabbott commented Jun 23, 2021

What if project returns a closure

That's an idea I didn't think of, and in some way the natural way to store what information you need until you need it.

What I think needs some thought is how these compose. My idea in https://github.com/FluxML/Zygote.jl/pull/965/files#diff-9bc4a61f220da7bc58a4009fe88887b5b584b3d6139c68b0e13cbdbcd21f7289 was to make them recursive, so that input x::Diagonal{Float64} results in first a projection of dx::Matrix{Complex{Float64}} first to be Diagonal, and then to be real. If someone later wants to ensure that (say) Quaternions get projected to real, their rule ought also to be applied after (say) the projection onto Symmetric.

Maybe you could do something like this, but this isn't quite right if projection stores the size... neither before not after then seems like it will work:

projector(x::Diagonal) =  (dx -> project(Diagonal, dx)) ∘ projector(parent(x))

(where projector is the function which returns the closure, and project performs projection.)

Comment on lines +4 to +7
project([T::Type], x, dx)

"project" `dx` onto type `T` such that it is the same size as `x`. If `T` is not provided,
it is assumed to be the type of `x`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of T in addition to x? It sounds like you have something in mind but I don't see what. In what directions may they differ? May T be abstract?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh T could be Tangent as in line 60

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why? And when, I mean for which x would this be chosen somehow? Sounds a bit like an orthogonal feature.

Copy link
Member Author

@mzgubic mzgubic Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was to ditch T altogether, and privilege some tangent (probably Tangent) representation as canonical.

However, somewhere in the many threads about this, @oxinabox argued that we need to go both ways (to tangent and to natural differential say). e.g. if a rule author writes a rule only for Diagonal and wants to make sure that the Tangent is being transformed right

Edit: sorry, the other way, writes a rule for Tangent and wants to make sure all the tangents it gets are Tangents

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but this still seems a bit like it's rolling two things together. One is a projection to preserve the input's x's real-ness / Diagon-ality etc. This would be applied automatically to every single rule.

The other is a way to massage certain dxs to a form digestible by certain rules. It's possible that there is some efficiency gained by fusing these processes into one function, but... for a first stab, it seems that combing them is adding complication?

Copy link
Member

@oxinabox oxinabox Jun 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be applied automatically to every single rule.

No, I don't think so.
As i see it:

In that use, project is one way that lets us write things that can run on anything, while getting out the type (and associated basis) that we want.
But it's not the only way, and its not always the best way.
In fact i think it rarely is, since it often results in allocating and computing things that we are going going to throw away.
Its just that it is the most generic way.
Alternatives for example, are things like structure preserving maps and broadcasts.
Which avoid allocations and computing things they are going to throw away.
Though i guess in those cases you can still project after, but it will be a identity operation.
Like convert(Array, [1,2,3,]) is an identity operation.

Which brings me around to the general thing:
This is like convert except that it is for vector spaces and is aware of vector space concerns, where are your zeros (which is same as basis)
Which is the same convert-like operation one want to massage on the way in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in those cases you can still project after, but it will be a identity operation

Yes, that's my thinking. In my list of examples somewhere, any operation mixing real & complex arrays is something which ought to project. But making every rule do the ideal thing would be a lot of work for little benefit. Whereas projecting back to a real array for hcat([1,2], [3+im, 4+im])'s gradient at least means it propagates no further.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, anyway, point is it is our vector space aware convert so it is the right name for both the thing you (often) want to do at the end.
But it is also the one you might want to do at the start

@mzgubic
Copy link
Member Author

mzgubic commented Jul 6, 2021

closed in favour of #385

@mzgubic mzgubic closed this Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants