WIP: `project` implementation #380

mzgubic · 2021-06-22T14:54:52Z

Builds on top of #306, adds tests and covers some more edge cases.

This is still very preliminary, still need to test how it plays with ChainRules.

To Do:

plug in to ChainRules
add more structured array types (other than Diagonal) and tests

adding guidance to docs on how to write rules will be done as a part of #376

CarloLucibello · 2021-06-23T09:09:06Z

It may be annoying to be forced to use x in a pullback without being able to release it. What if project returns a closure instead, capturing only the relevant information for the projection (e.g. types and shapes):

projx = project(x) # outside the pullback
dx = projx(dx) # inside the pullback

willtebbutt · 2021-06-23T09:15:32Z

@CarloLucibello I think @mcabbott had a similar comment on a related PR -- I agree with you both that we should do something like this.

oxinabox · 2021-06-23T10:45:31Z

src/projection.jl

+project(::Type{T}, x::T, dx::Complex) where {T<:Real} = T(real(dx))
+
+
+# Arrays


I have a feeling we might need to support the same kwargs that similar does.
which iirc are size and eltype.
So you can say this AbstractArray type, except the the size should be this and the eltype that.

E.g.
for primal being WrapperArray{Tuple{Float64,Float64}}

project(typeof(primal), dx; eltype=Tangent{Tuple{Float64,Float64}}

to get back a WrapperArray{Tangent{Tuple{Float64,Float64}}}

And size would let us just store the size, rather than the whole primal type for zeros.

We might need a ~~structual_zeros_from~~ set_structural_zeros_by function or strutural_zero_inds list of indices, for if structural zero locations are only known by value (like for SparseCSC)

I can see where you're coming from, but is it not both cleaner and more general to do the kind of thing that @CarloLucibello suggests?

these seem orthogonal to me.
Or at least that the closure @CarloLucibello suggest would be implemented on top of this.
So that it extracts these bits of information from the primal and passes them into the kwargs.
Rather than closing over everything

these seem orthogonal to me.

Hmmm maybe I've misunderstood your reasoning for needing eltype and size kwargs then.

Are you saying we might need it because we would want to apply project for some primal other than the primal that the tangent is for?

i.e. let x and y be primals of differing sizes. Say that I've got a tangent dx for x, but for some reason I've only got access to y. So I say project(y, dx; size=size_of_x).

Is this the kind of situation you're imagining?

mcabbott · 2021-06-23T12:15:35Z

What if project returns a closure

That's an idea I didn't think of, and in some way the natural way to store what information you need until you need it.

What I think needs some thought is how these compose. My idea in https://github.com/FluxML/Zygote.jl/pull/965/files#diff-9bc4a61f220da7bc58a4009fe88887b5b584b3d6139c68b0e13cbdbcd21f7289 was to make them recursive, so that input x::Diagonal{Float64} results in first a projection of dx::Matrix{Complex{Float64}} first to be Diagonal, and then to be real. If someone later wants to ensure that (say) Quaternions get projected to real, their rule ought also to be applied after (say) the projection onto Symmetric.

Maybe you could do something like this, but this isn't quite right if projection stores the size... neither before not after then seems like it will work:

projector(x::Diagonal) =  (dx -> project(Diagonal, dx)) ∘ projector(parent(x))

(where projector is the function which returns the closure, and project performs projection.)

mcabbott · 2021-06-23T12:18:01Z

src/projection.jl

+    project([T::Type], x, dx)
+
+"project" `dx` onto type `T` such that it is the same size as `x`. If `T` is not provided,
+it is assumed to be the type of `x`.


What's the point of T in addition to x? It sounds like you have something in mind but I don't see what. In what directions may they differ? May T be abstract?

Oh T could be Tangent as in line 60

But why? And when, I mean for which x would this be chosen somehow? Sounds a bit like an orthogonal feature.

My initial thought was to ditch T altogether, and privilege some tangent (probably Tangent) representation as canonical.

However, somewhere in the many threads about this, @oxinabox argued that we need to go both ways (to tangent and to natural differential say). e.g. if a rule author writes a rule only for Diagonal and wants to make sure that the Tangent is being transformed right

Edit: sorry, the other way, writes a rule for Tangent and wants to make sure all the tangents it gets are Tangents

OK, but this still seems a bit like it's rolling two things together. One is a projection to preserve the input's x's real-ness / Diagon-ality etc. This would be applied automatically to every single rule.

The other is a way to massage certain dxs to a form digestible by certain rules. It's possible that there is some efficiency gained by fusing these processes into one function, but... for a first stab, it seems that combing them is adding complication?

This would be applied automatically to every single rule.

No, I don't think so.
As i see it:

In that use, project is one way that lets us write things that can run on anything, while getting out the type (and associated basis) that we want.
But it's not the only way, and its not always the best way.
In fact i think it rarely is, since it often results in allocating and computing things that we are going going to throw away.
Its just that it is the most generic way.
Alternatives for example, are things like structure preserving maps and broadcasts.
Which avoid allocations and computing things they are going to throw away.
Though i guess in those cases you can still project after, but it will be a identity operation.
Like convert(Array, [1,2,3,]) is an identity operation.

Which brings me around to the general thing:
This is like convert except that it is for vector spaces and is aware of vector space concerns, where are your zeros (which is same as basis)
Which is the same convert-like operation one want to massage on the way in.

in those cases you can still project after, but it will be a identity operation

Yes, that's my thinking. In my list of examples somewhere, any operation mixing real & complex arrays is something which ought to project. But making every rule do the ideal thing would be a lot of work for little benefit. Whereas projecting back to a real array for hcat([1,2], [3+im, 4+im])'s gradient at least means it propagates no further.

yeah, anyway, point is it is our vector space aware convert so it is the right name for both the thing you (often) want to do at the end.
But it is also the one you might want to do at the start

mzgubic · 2021-07-06T10:32:19Z

closed in favour of #385

willtebbutt and others added 6 commits June 22, 2021 11:37

Sketch project implementation

914bd92

change Composite to Tangent

06678a4

export project

c58f974

make T optional

00020e3

add tests and Complex

37f9253

workout the edge cases

4e1b79d

This was referenced Jun 22, 2021

WIP: use project tangents JuliaDiff/ChainRules.jl#453

Closed

WIP: changes needed to support testing of projected tangents JuliaDiff/FiniteDifferences.jl#178

Closed

oxinabox reviewed Jun 23, 2021

View reviewed changes

mcabbott reviewed Jun 23, 2021

View reviewed changes

This was referenced Jun 24, 2021

WIP: projector implementation (returning a closure) #382

Closed

WIP: preproject and project implementation #383

Closed

oxinabox mentioned this pull request Jun 29, 2021

use a functor for projection #385

Merged

mzgubic mentioned this pull request Jun 30, 2021

WIP: ProjectTo implementation #387

Closed

mzgubic closed this Jul 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: `project` implementation #380

WIP: `project` implementation #380

mzgubic commented Jun 22, 2021

CarloLucibello commented Jun 23, 2021

willtebbutt commented Jun 23, 2021

oxinabox Jun 23, 2021 •

edited

Loading

willtebbutt Jun 23, 2021

oxinabox Jun 23, 2021

willtebbutt Jun 23, 2021

mcabbott commented Jun 23, 2021 •

edited

Loading

mcabbott Jun 23, 2021

mzgubic Jun 23, 2021

mcabbott Jun 23, 2021

mzgubic Jun 23, 2021 •

edited

Loading

mcabbott Jun 23, 2021

oxinabox Jun 23, 2021 •

edited by mzgubic

Loading

mcabbott Jun 23, 2021

oxinabox Jun 23, 2021

mzgubic commented Jul 6, 2021

		project(::Type{T}, x::T, dx::Complex) where {T<:Real} = T(real(dx))


		# Arrays

WIP: project implementation #380

WIP: project implementation #380

Conversation

mzgubic commented Jun 22, 2021

CarloLucibello commented Jun 23, 2021

willtebbutt commented Jun 23, 2021

oxinabox Jun 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcabbott commented Jun 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzgubic Jun 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox Jun 23, 2021 • edited by mzgubic Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzgubic commented Jul 6, 2021

WIP: `project` implementation #380

WIP: `project` implementation #380

oxinabox Jun 23, 2021 •

edited

Loading

mcabbott commented Jun 23, 2021 •

edited

Loading

mzgubic Jun 23, 2021 •

edited

Loading

oxinabox Jun 23, 2021 •

edited by mzgubic

Loading