Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add trainables #171

Merged
merged 9 commits into from
Apr 4, 2024
Merged

add trainables #171

merged 9 commits into from
Apr 4, 2024

Conversation

CarloLucibello
Copy link
Member

@CarloLucibello CarloLucibello commented Apr 1, 2024

An alternative to #57 adding the trainables method that returns a vector of arrays.

I'm playing with different implementations at the moment. The output of the perf() function is

trainables1
  1.717 μs (33 allocations: 1.48 KiB)
trainables2
  2.708 μs (63 allocations: 3.12 KiB)
trainables3
  11.208 μs (147 allocations: 4.39 KiB)

gradient trainables2
  1.546 ms (7157 allocations: 304.39 KiB)
gradient trainables3
  249.625 μs (2289 allocations: 115.39 KiB)

trainables1 is the fastest but since it is mutating it needs a custom rrule for differentiation. Probably the rrule for destructor can be adapted for this case
https://github.com/FluxML/Optimisers.jl/blob/master/src/destructure.jl

The gradients of the other two implementations are very slow, also in these cases we would need a custom rule.

TODO

  • write custom differentiation rules
  • take care of shared parameters (automatically done by fmap)
  • tests
  • docs

@CarloLucibello
Copy link
Member Author

Implemented custom rule for trainable1, with the last version of perf() now the measurements are

trainables1
  9.833 μs (54 allocations: 2.34 KiB)
trainables2
  11.625 μs (94 allocations: 4.30 KiB)
trainables3
  22.625 μs (189 allocations: 5.70 KiB)

gradient trainables1
  29.584 μs (213 allocations: 268.50 KiB)
gradient trainables2
  1.825 ms (8419 allocations: 601.59 KiB)
gradient trainables3
  307.000 μs (2636 allocations: 377.53 KiB)

@CarloLucibello
Copy link
Member Author

I'll focus on trainables1, and pasting here the full script for future reference

using BenchmarkTools
using Optimisers
using Functors
using Zygote, Flux
using ChainRulesCore

function trainables1(x)
    arrays = AbstractArray[]
    exclude(x) = Optimisers.isnumeric(x)
    fmap(x; exclude, walk = Optimisers._TrainableStructWalk()) do y
        push!(arrays, y)
        return y
    end
    return arrays
end

function ∇trainables1(x, Δ)
    exclude(x) = Optimisers.isnumeric(x)
    i = 0
    return fmapstructure(x; exclude, walk = Optimisers._TrainableStructWalk()) do _
                return Δ[i+=1]
           end
end


function ChainRulesCore.rrule(::typeof(trainables1), x)
    y = trainables1(x)
    trainables_back(Δ) = (NoTangent(), ∇trainables1(x, unthunk(Δ)))
    return y, trainables_back
end

############

using Functors: AbstractWalk, _map, _values, execute, ExcludeWalk

struct TrainableWalk2 <: AbstractWalk end

function (walk::TrainableWalk2)(recurse, x, ys...)
    x_children = Optimisers.trainable(x)
    ys_children = map(Optimisers.trainable, ys)
    res = map(recurse, x_children, ys_children...)
    return reduce(vcat, values(res),init=[])
end

function trainables2(x)
    exclude(x) = Optimisers.isnumeric(x) && Functors.isleaf(x)
    return execute(ExcludeWalk(TrainableWalk2(), x ->[x], exclude), x)
end


struct TrainableWalk3 <: AbstractWalk end

function (walk::TrainableWalk3)(recurse, x, ys...)
    x_children = Optimisers.trainable(x)
    ys_children = map(Optimisers.trainable, ys)
    res = map(recurse, x_children, ys_children...)
    return vcat(values(res)...)
end

function trainables3(x)
    exclude(x) = Optimisers.isnumeric(x)
    return execute(ExcludeWalk(TrainableWalk3(), x ->[x], exclude), x)
end


function floss(ps)
    sum([sum(abs2, p) for p in ps])
end

using Flux

function perf()
    m = Chain(Dense(128 => 128, relu), 
              Dense(128 => 128, relu),
              BatchNorm(128),
              x -> x^2,
              Dense(128 => 128, relu), 
              Dense(128 => 128, relu))

    println("trainables1")
    @btime floss(trainables1($m))
    println("trainables2")
    @btime floss(trainables2($m))
    println("trainables3")
    @btime floss(trainables3($m))
    println()

    println("gradient trainables1")
    @btime gradient(m -> floss(trainables1(m)), $m)
    println("gradient trainables2")
    @btime gradient(m -> floss(trainables2(m)), $m)
    println("gradient trainables3")
    @btime gradient(m -> floss(trainables3(m)), $m)

    nothing
end

Zygote.refresh()
perf()

@CarloLucibello
Copy link
Member Author

could anyone review?

Copy link
Member

@darsnack darsnack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only long term suggestion is to address FluxML/Functors.jl#81 and adjust fleaves to match the performance here. But these are implementation details that don't affect the API, and this seems ready as is.

docs/src/index.md Outdated Show resolved Hide resolved
src/trainables.jl Outdated Show resolved Hide resolved
@darsnack
Copy link
Member

darsnack commented Apr 4, 2024

One API consideration before merging and releasing is whether this needs to be separate from #173. It's trivial to ignore the path if it's not relevant.

Also, while okay for Functors, I'm not a fan of duplicated *_with_path functions. Like @ToucheSir suggested, if we want two variants, then a keyword flag seems like a better API.

CarloLucibello and others added 2 commits April 4, 2024 15:24
Co-authored-by: Kyle Daruwalla <[email protected]>
Co-authored-by: Kyle Daruwalla <[email protected]>
@CarloLucibello
Copy link
Member Author

I kept #173 separate to simplify the review of this one. Let's continue the discussion there.

@CarloLucibello CarloLucibello merged commit a87ffd5 into master Apr 4, 2024
4 of 5 checks passed
mashu pushed a commit to mashu/Optimisers.jl that referenced this pull request Nov 14, 2024
* trainables

* trainables

* cl/trainables

* trainables

* test second order derivatives

* add doc section

* fix test

* Update src/trainables.jl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants