-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: project
implementation
#380
Conversation
It may be annoying to be forced to use projx = project(x) # outside the pullback
dx = projx(dx) # inside the pullback |
@CarloLucibello I think @mcabbott had a similar comment on a related PR -- I agree with you both that we should do something like this. |
project(::Type{T}, x::T, dx::Complex) where {T<:Real} = T(real(dx)) | ||
|
||
|
||
# Arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling we might need to support the same kwargs that similar
does.
which iirc are size
and eltype
.
So you can say this AbstractArray type, except the the size should be this and the eltype that.
E.g.
for primal
being WrapperArray{Tuple{Float64,Float64}}
project(typeof(primal), dx; eltype=Tangent{Tuple{Float64,Float64}}
to get back a WrapperArray{Tangent{Tuple{Float64,Float64}}}
And size
would let us just store the size, rather than the whole primal type for zeros.
We might need a structual_zeros_from
set_structural_zeros_by
function or strutural_zero_inds
list of indices, for if structural zero locations are only known by value (like for SparseCSC
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see where you're coming from, but is it not both cleaner and more general to do the kind of thing that @CarloLucibello suggests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem orthogonal to me.
Or at least that the closure @CarloLucibello suggest would be implemented on top of this.
So that it extracts these bits of information from the primal and passes them into the kwargs.
Rather than closing over everything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem orthogonal to me.
Hmmm maybe I've misunderstood your reasoning for needing eltype
and size
kwargs then.
Are you saying we might need it because we would want to apply project
for some primal other than the primal that the tangent is for?
i.e. let x
and y
be primals of differing size
s. Say that I've got a tangent dx
for x
, but for some reason I've only got access to y
. So I say project(y, dx; size=size_of_x)
.
Is this the kind of situation you're imagining?
That's an idea I didn't think of, and in some way the natural way to store what information you need until you need it. What I think needs some thought is how these compose. My idea in https://github.com/FluxML/Zygote.jl/pull/965/files#diff-9bc4a61f220da7bc58a4009fe88887b5b584b3d6139c68b0e13cbdbcd21f7289 was to make them recursive, so that input Maybe you could do something like this, but this isn't quite right if projection stores the size... neither before not after then seems like it will work:
(where |
project([T::Type], x, dx) | ||
|
||
"project" `dx` onto type `T` such that it is the same size as `x`. If `T` is not provided, | ||
it is assumed to be the type of `x`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of T
in addition to x
? It sounds like you have something in mind but I don't see what. In what directions may they differ? May T
be abstract?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh T could be Tangent as in line 60
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why? And when, I mean for which x
would this be chosen somehow? Sounds a bit like an orthogonal feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought was to ditch T
altogether, and privilege some tangent (probably Tangent
) representation as canonical.
However, somewhere in the many threads about this, @oxinabox argued that we need to go both ways (to tangent and to natural differential say). e.g. if a rule author writes a rule only for Diagonal
and wants to make sure that the Tangent
is being transformed right
Edit: sorry, the other way, writes a rule for Tangent
and wants to make sure all the tangents it gets are Tangent
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but this still seems a bit like it's rolling two things together. One is a projection to preserve the input's x
's real-ness / Diagon-ality etc. This would be applied automatically to every single rule.
The other is a way to massage certain dx
s to a form digestible by certain rules. It's possible that there is some efficiency gained by fusing these processes into one function, but... for a first stab, it seems that combing them is adding complication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be applied automatically to every single rule.
No, I don't think so.
As i see it:
In that use, project
is one way that lets us write things that can run on anything, while getting out the type (and associated basis) that we want.
But it's not the only way, and its not always the best way.
In fact i think it rarely is, since it often results in allocating and computing things that we are going going to throw away.
Its just that it is the most generic way.
Alternatives for example, are things like structure preserving maps and broadcasts.
Which avoid allocations and computing things they are going to throw away.
Though i guess in those cases you can still project
after, but it will be a identity operation.
Like convert(Array, [1,2,3,])
is an identity operation.
Which brings me around to the general thing:
This is like convert
except that it is for vector spaces and is aware of vector space concerns, where are your zeros (which is same as basis)
Which is the same convert-like operation one want to massage on the way in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in those cases you can still project after, but it will be a identity operation
Yes, that's my thinking. In my list of examples somewhere, any operation mixing real & complex arrays is something which ought to project. But making every rule do the ideal thing would be a lot of work for little benefit. Whereas projecting back to a real array for hcat([1,2], [3+im, 4+im])
's gradient at least means it propagates no further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, anyway, point is it is our vector space aware convert
so it is the right name for both the thing you (often) want to do at the end.
But it is also the one you might want to do at the start
closed in favour of #385 |
Builds on top of #306, adds tests and covers some more edge cases.
This is still very preliminary, still need to test how it plays with ChainRules.
To Do:
adding guidance to docs on how to write rules will be done as a part of #376