Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fmap(f, x, y) useful #37

Merged
merged 13 commits into from
Feb 9, 2022
Merged

Make fmap(f, x, y) useful #37

merged 13 commits into from
Feb 9, 2022

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Feb 6, 2022

This alters fmap(f, x, y) to more closely agree with fmap(f, x), in particular to take kwywords exclude, walk.

And it inserts functor(typeof(x), y), so that if not all fields of x are children, the same fields of y are used. The first object x is the primary one, whose objects are stored in the cache. But there is no longer any assumption that the 2nd is a gradient, or that Nothing is special.

Third, it adds a prune keyword, which allows you to instead use a constant for all repeated leaves, rather than the original. Probably the same keyword should be added to fmap(f, x).

What motivates this is an attempt to write destructure using fmap(x, y), FluxML/Optimisers.jl#54. Where in fact y is a tree of offsets; and the walk which includes the gradients doesn't really want the cache, since it needs to look at all leaves.

These changes don't break any existing tests. But should of course have better tests, and docs.

@ToucheSir
Copy link
Member

Looking at the state of kwargs for fmap, I'm of the opinion that we should consider moving the responsibility for caching + pruning out of the function. fmap and co would then be natively differentiable, and there would be fewer params to thread through the call stack. That's not to say we should remove caching from Functors altogther. Instead, I think they could be handled by a type that wraps the callback function and encapsulates any caching + pruning related state. This would also open up the opportunity to customize caching behaviour on a case by case basis. Users who are fine with a plain tree traversal sans duplicate checking experience no additional overhead, while those who want just a subset of this functionality can pay for what they use.

@mcabbott
Copy link
Member Author

mcabbott commented Feb 7, 2022

I can believe there might be cleaner ways to do things. But for now I propose baby steps, in part to feel out what's needed --- accidentally choosing a cleaner design which prohibited prune would make destructure harder.

src/functor.jl Outdated
haskey(cache, x) && return cache[x]
y = exclude(x) ? f(x) : walk(x -> fmap(f, x, exclude = exclude, walk = walk, cache = cache), x)
cache[x] = y
const INIT = Base._InitialValue() # sentinel value for keyword not supplied
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe less opaque to just define struct NoPrune end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do. Are you OK with the interface being that the absence of this keyword is how you turn this off? That's how init often works; and here, the values you might use for "don't prune" like false or nothing are all reasonable choices for the value to use during pruning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is weird to me, but the API makes sense. I'm happy.

src/functor.jl Outdated
return y
function fmap(f, x; exclude = isleaf, walk = _default_walk, cache = IdDict(), prune = NoKeyword())
haskey(cache, x) && return prune isa NoKeyword ? cache[x] : prune
cache[x] = exclude(x) ? f(x) : walk(x -> fmap(f, x; exclude, walk, cache, prune), x)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use keywords like this, this package cannot claim to support Julia 1.0. At present it is only tested on 1.5+. Maybe we should just move to 1.6?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have on CI, might as well make it official.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what repo I was looking at, because I just checked back and it's 1.5 still. Do we need to cut a breaking release for minimum version bumps like this again?

Copy link
Member Author

@mcabbott mcabbott Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 1.0 here: https://github.com/FluxML/Functors.jl/blob/master/Project.toml#L7

Tests do in fact pass on 1.0, for Functors v0.2.7, despite the lack of CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, people say not to do this on a patch release, because it closes the door on a bugfix-for-1.0 release. Do we care?

We may also need a breaking release for #33, and to make the cache not used on isbits arguments. We could gang these together.

@ToucheSir
Copy link
Member

I can believe there might be cleaner ways to do things. But for now I propose baby steps, in part to feel out what's needed --- accidentally choosing a cleaner design which prohibited prune would make destructure harder.

This is actually my worry with adding yet another kwarg. Removing it requires a breaking change and deprecation cycle, which sucks. Let me cook up something tonight for the wrapper version, because I feel like the keyword approach is probably the less flexible of the two.

@ToucheSir
Copy link
Member

Ok, after trying out some things I'm generally ok with this given a migration path like the following:

  1. Create a separate function that doesn't rely on (as many kwargs) and give it a better name.
  2. Switch downstream over to using the new function
  3. Deprecate and remove fmap. I'd love to use the name for something more appropriate if possible. For example, what we're currently calling "walk" is really what FP people use fmap for, not nested (heterogeneous) structure traversal.

@mcabbott
Copy link
Member Author

mcabbott commented Feb 9, 2022

Ok. I'm a little worried that after this re-design I won't understand this library at all!

But for now I think it would be great to get the things which work today merged and in use. Rather than wait for a cleaner design and then wait for someone to have time to re-write Optimisers to use it... What's the plan here, is going to Julia 1.6 on v0.2.8 ok? This is not otherwise breaking, and I am happy to walk away from 1.0 forever. Or should this be made to work on 1.0, for now, until enough other things are ready to justify 0.3?

@ToucheSir
Copy link
Member

You've already seen 3/4ths of the "re-design" in #32, so I wouldn't be too worried about that 😉.

On the lack of compatibility, what exactly doesn't work on 1.0? I didn't really understand why the existing usage of keyword args was fine but the additions in this PR aren't.

@mcabbott
Copy link
Member Author

mcabbott commented Feb 9, 2022

8486098 might let it work on 1.0. Tests fail locally as it is unable to get a working version of Zygote.

@mcabbott mcabbott force-pushed the biwalk branch 2 times, most recently from 8486098 to a6a3b75 Compare February 9, 2022 04:39
@mcabbott
Copy link
Member Author

mcabbott commented Feb 9, 2022

Good to go?

No tagged version has fmap(f, xs...) so this can't break anything.

@mcabbott mcabbott merged commit 0cf7942 into FluxML:master Feb 9, 2022
@mcabbott mcabbott deleted the biwalk branch February 9, 2022 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants