Allow shared parameters, take III #106

mcabbott · 2022-08-28T16:21:59Z

Another take on #100. Borrows the idea of making Leaf mutable.

~~Tries~~Tried to be simpler by pushing more of the recursion onto fmap:

setup is just fmapstructure really. Its notion of sharing is thus exactly the one of Functors, one source of truth. We should fix that not to share isbits types, eventually.
update! is just fmap. Much of the complication of the old walk was to reconstruct both the state and the model on the way out. But this isn't needed if Leaf is mutated.

Tests from #100 pass with first commit. However, the shared Leaves must always match shared Arrays. It's possible that this scenario can be done even more simply, possibly without mutable Leaf.

What #100 does is instead to take shared Leaves as the truth about parameter sharing, which some future API could set in a way not matching the model (for ImmutableArrays, etc) even though present setup will not. Then update! cannot just be fmap, and needs one more separate IdDict for the parameters. Second commit here e84b61b bolts that on, and adds a test of it (which also pass using #100). But it's a bit ugly.

Edit: Third commit 0de29e1 instead just replaces the walk used for fmap(f, tree, x) to use re from its 2nd argument, while Functors still uses the cache on the 1st argument. That's tidier.

But the state tree contains the the same () at every non-parameter node, and Functors caches the results of these... we should fix this upstream? A possible hack for now would be to supply a special cache IdDict{Leaf} which cannot store anything else -- done in e17e474.

But... that's still not right. If there are mutable layer structs, then I think you cannot rely on the ID of mutable Leaf to tie things. So I gave up on customising fmap and wrote out the recursion using (x,Leaf()) as the key for reconstruction.

Gradient accumulation uses an IdDict as in #100, but ~~stores a broadcasted adding the pieces. Which it thus requires all apply! methods to accept. They all do.~~ Changed to eager addition.

~~Does not at present allow for more than one derivative. But no rules use that.~~ Added. There were no tests it seems.

Fixes the bug noted in #100 that update could in fact mutate the state. Does this by just saying @functor Leaf. Added a test.

One further possibility with a mutable Leaf is that if can easily have a flag to mark some parameters as temporarily frozen. ~~This is implemented here (with no API to set the flag). Not sure it's what we want though. Easy to remove but perhaps if we're changing the struct we should consider other changes we might want.~~

Because setup does not call itself in recursion, it is fairly easy to add a warning if the model has no parameters. This was something someone complained about, I forget where.

Closes #42, closes #100, closes #97

ToucheSir · 2022-08-28T17:33:13Z

A possible hack for now would be to supply a special cache IdDict{Leaf} which cannot store anything else.

I'd say this is less of a hack and something we should be doing more often. Either define a custom cache type, or (better) attach the cache to the callback itself by memoizing it. Then fmap and the rest of Functors can avoid cache management altogether.

src/adjust.jl

mcabbott · 2022-08-28T20:30:16Z

src/interface.jl

+function setup(rule::AbstractRule, model)
+  cnt = Ref(0)
+  # Rely on Functors to identify shared arrays, they will share a Leaf in this tree:
+  tree = fmapstructure(model, exclude = isnumeric) do x


It's pretty surprising tests pass with this, as it doesn't check trainable at all.

src/interface.jl

ToucheSir · 2022-08-29T02:29:14Z

src/interface.jl

-  update!(t′, x′, x̄s...)
+function _update!(tree, x; grads, params)
+  haskey(params, (tree,x)) && return params[(tree,x)]
+  isbits(tree) && return x  # means () is not cached, and also (((),),)


This does imply we will be caching almost every level of an average Flux model (since BitsType{NotBits, BitsTypes...} is not a bitstype). objectid being not the fastest function in the world, perhaps both cache lookup and insertion should be additionally guarded by ismutable(x).

I wondered this too. For large ImmutableArrays this may eventually need something fancier. But for now I think every fmap walk does the same thing.

Oh I wasn't even thinking about those, but cases like JuliaLang/julia#43542. We're unlikely to see any truly pathological behaviour, but I have to imagine the single comparison ismutable makes is more efficient than the recursive hash function objectid uses.

OK. I guess ismutable really is right here. For parameter arrays IIRC there was a concern that it tells you e.g. that PermutedDimsArray is immutable. But for known non-leaf types, maybe it's always right?

Good point. PermutedDimsArray at least does implement functor, but you can always find an array wrapper which hasn't. Perhaps then the check should be isleaf instead? The isbits check is still useful either way.

Edit: I suppose isnumeric makes more sense since it forwards to isleaf already and setup guarantees only unfamiliar immutable wrappers of immutable arrays will get their own Leaf. Moving the isbits check up front also seems safe and could save a couple cycles on dict lookups.

function _update!(tree, x; grads, params) isbits(tree) && return x # means () is not cached, and also (((),),) isnum = isnumeric(x) isnum && haskey(params, (tree,x)) && return params[(tree,x)] children, re = functor(x) children′ = map((tᵢ, xᵢ) -> _update!(tᵢ, xᵢ; grads, params), tree, children) x′ = re(children′) isnum ? (params[(tree,x)] = x′) : x′ end

It's likely this can be simplified, but I wanted to get something on the page first in case there are any unforeseen edge cases present in this formulation.

I think anything isnumeric should have a corresponding Leaf and hit the _update!(::Leaf, x; ...) method.

This one wants only to deal with mutable non-leaf things, like my mutable struct MutTwo example. Which makes me think that ismutable is fine -- we have Foo(MutTwo(Bar(Transpose(Array, then the Array is leaf, and the only level at which it's worthwhile for this method to cache anything is the MutTwo one. If this whole stack appears twice, a fresh new struct Foo cannot be distinguished from the old one.

mcabbott · 2022-10-11T18:13:48Z

Shall we do this?

I don't love it, and feel a bit bad about re-writing #100 in order to understand it... but this does add some features in the end.

But I do think we ought to handle shared parameters, and that we want mutable Leaf for other reasons too. (Namely: It enables freeze!. It allows for a Flux.train! without manually passing the state.)

We can re-write the internals if FluxML/Functors.jl#43 or something allows for a prettier version. The tests are pretty good.

Maybe if isbits(x) should be if !Functors.anymutable(x) from FluxML/Functors.jl#39 . Or a copy of that function if it's only in Functors 0.4 & we don't want to wait.

Edit: In fact perhaps setup can be simplified by from FluxML/Functors.jl#39 already, since fmapstructure will not create spurious ties? But it still needs a trainable walk, and maybe that's better done after FluxML/Functors.jl#43 too.

ToucheSir · 2022-10-12T01:00:24Z

I have no objections assuming we're not considering any behavioural changes after those Functors PRs are merged.

darsnack · 2022-10-12T20:32:50Z

I am also okay with doing this

Co-authored-by: Brian Chen <[email protected]>

mcabbott · 2022-10-13T19:15:47Z

Ok let's do it.

mcabbott mentioned this pull request Aug 28, 2022

Transparent handling of tied weights #100

Closed

mcabbott marked this pull request as draft August 28, 2022 16:47

ToucheSir reviewed Aug 28, 2022

View reviewed changes

src/adjust.jl Outdated Show resolved Hide resolved

mcabbott mentioned this pull request Aug 28, 2022

Frozen parameters #107

Open

mcabbott force-pushed the duplicated3 branch from 5c0045a to f046185 Compare August 28, 2022 19:53

mcabbott commented Aug 28, 2022

View reviewed changes

ToucheSir reviewed Aug 29, 2022

View reviewed changes

src/interface.jl Show resolved Hide resolved

ToucheSir reviewed Aug 29, 2022

View reviewed changes

mcabbott mentioned this pull request Sep 17, 2022

Upgrade train! to work with explicit parameters FluxML/Flux.jl#2029

Closed

4 tasks

darsnack mentioned this pull request Oct 11, 2022

Separate walks out from fmap and add #39 to fcollect FluxML/Functors.jl#43

Merged

mcabbott and others added 9 commits October 12, 2022 18:01

allow shared parameters, take III

9b28112

Co-authored-by: Brian Chen <[email protected]>

one more dict to allow artificial ties

64d5d9f

a tidier idea, just replace _default_walk

670e49a

add a LeafCache type, to make fmap ignore () singleton

6db7a36

remove leaf.frozen field

5e5d5db

eager accumulation

522f66a

give up on customising fmap & write the recursion, add evil tests

3172f13

add ismutable check

37521c8

docs etc

0d6619a

mcabbott force-pushed the duplicated3 branch from df8a952 to 0d6619a Compare October 12, 2022 22:04

fix doctests

d13e52a

mcabbott marked this pull request as ready for review October 12, 2022 22:35

group the tests

1577b88

mcabbott force-pushed the duplicated3 branch from 3bca907 to 1577b88 Compare October 12, 2022 22:44

mcabbott merged commit 9c12e5d into FluxML:master Oct 13, 2022

mcabbott deleted the duplicated3 branch October 13, 2022 19:15

avik-pal mentioned this pull request Oct 14, 2022

No matching function wrapper found ChrisRackauckas/universal_differential_equations#50

Closed

mcabbott mentioned this pull request Oct 26, 2022

Mark OptimiserChain as @functor and improve inference for apply! #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow shared parameters, take III #106

Allow shared parameters, take III #106

mcabbott commented Aug 28, 2022 •

edited

Loading

ToucheSir commented Aug 28, 2022 •

edited

Loading

mcabbott Aug 28, 2022

ToucheSir Aug 29, 2022 •

edited

Loading

mcabbott Aug 29, 2022

ToucheSir Aug 29, 2022 •

edited

Loading

mcabbott Aug 29, 2022

ToucheSir Aug 29, 2022 •

edited

Loading

mcabbott Aug 29, 2022

mcabbott commented Oct 11, 2022 •

edited

Loading

ToucheSir commented Oct 12, 2022

darsnack commented Oct 12, 2022

mcabbott commented Oct 13, 2022

Allow shared parameters, take III #106

Allow shared parameters, take III #106

Conversation

mcabbott commented Aug 28, 2022 • edited Loading

ToucheSir commented Aug 28, 2022 • edited Loading

mcabbott Aug 28, 2022

Choose a reason for hiding this comment

ToucheSir Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

mcabbott Aug 29, 2022

Choose a reason for hiding this comment

ToucheSir Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

mcabbott Aug 29, 2022

Choose a reason for hiding this comment

ToucheSir Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

mcabbott Aug 29, 2022

Choose a reason for hiding this comment

mcabbott commented Oct 11, 2022 • edited Loading

ToucheSir commented Oct 12, 2022

darsnack commented Oct 12, 2022

mcabbott commented Oct 13, 2022

mcabbott commented Aug 28, 2022 •

edited

Loading

ToucheSir commented Aug 28, 2022 •

edited

Loading

ToucheSir Aug 29, 2022 •

edited

Loading

ToucheSir Aug 29, 2022 •

edited

Loading

ToucheSir Aug 29, 2022 •

edited

Loading

mcabbott commented Oct 11, 2022 •

edited

Loading