add `trainables` #171

CarloLucibello · 2024-04-01T07:59:10Z

An alternative to #57 adding the trainables method that returns a vector of arrays.

I'm playing with different implementations at the moment. The output of the perf() function is

trainables1
  1.717 μs (33 allocations: 1.48 KiB)
trainables2
  2.708 μs (63 allocations: 3.12 KiB)
trainables3
  11.208 μs (147 allocations: 4.39 KiB)

gradient trainables2
  1.546 ms (7157 allocations: 304.39 KiB)
gradient trainables3
  249.625 μs (2289 allocations: 115.39 KiB)

trainables1 is the fastest but since it is mutating it needs a custom rrule for differentiation. Probably the rrule for destructor can be adapted for this case
https://github.com/FluxML/Optimisers.jl/blob/master/src/destructure.jl

The gradients of the other two implementations are very slow, also in these cases we would need a custom rule.

TODO

write custom differentiation rules
take care of shared parameters (automatically done by fmap)
tests
docs

CarloLucibello · 2024-04-01T22:37:01Z

Implemented custom rule for trainable1, with the last version of perf() now the measurements are

trainables1
  9.833 μs (54 allocations: 2.34 KiB)
trainables2
  11.625 μs (94 allocations: 4.30 KiB)
trainables3
  22.625 μs (189 allocations: 5.70 KiB)

gradient trainables1
  29.584 μs (213 allocations: 268.50 KiB)
gradient trainables2
  1.825 ms (8419 allocations: 601.59 KiB)
gradient trainables3
  307.000 μs (2636 allocations: 377.53 KiB)

CarloLucibello · 2024-04-01T22:39:01Z

I'll focus on trainables1, and pasting here the full script for future reference

using BenchmarkTools
using Optimisers
using Functors
using Zygote, Flux
using ChainRulesCore

function trainables1(x)
    arrays = AbstractArray[]
    exclude(x) = Optimisers.isnumeric(x)
    fmap(x; exclude, walk = Optimisers._TrainableStructWalk()) do y
        push!(arrays, y)
        return y
    end
    return arrays
end

function ∇trainables1(x, Δ)
    exclude(x) = Optimisers.isnumeric(x)
    i = 0
    return fmapstructure(x; exclude, walk = Optimisers._TrainableStructWalk()) do _
                return Δ[i+=1]
           end
end


function ChainRulesCore.rrule(::typeof(trainables1), x)
    y = trainables1(x)
    trainables_back(Δ) = (NoTangent(), ∇trainables1(x, unthunk(Δ)))
    return y, trainables_back
end

############

using Functors: AbstractWalk, _map, _values, execute, ExcludeWalk

struct TrainableWalk2 <: AbstractWalk end

function (walk::TrainableWalk2)(recurse, x, ys...)
    x_children = Optimisers.trainable(x)
    ys_children = map(Optimisers.trainable, ys)
    res = map(recurse, x_children, ys_children...)
    return reduce(vcat, values(res),init=[])
end

function trainables2(x)
    exclude(x) = Optimisers.isnumeric(x) && Functors.isleaf(x)
    return execute(ExcludeWalk(TrainableWalk2(), x ->[x], exclude), x)
end


struct TrainableWalk3 <: AbstractWalk end

function (walk::TrainableWalk3)(recurse, x, ys...)
    x_children = Optimisers.trainable(x)
    ys_children = map(Optimisers.trainable, ys)
    res = map(recurse, x_children, ys_children...)
    return vcat(values(res)...)
end

function trainables3(x)
    exclude(x) = Optimisers.isnumeric(x)
    return execute(ExcludeWalk(TrainableWalk3(), x ->[x], exclude), x)
end


function floss(ps)
    sum([sum(abs2, p) for p in ps])
end

using Flux

function perf()
    m = Chain(Dense(128 => 128, relu), 
              Dense(128 => 128, relu),
              BatchNorm(128),
              x -> x^2,
              Dense(128 => 128, relu), 
              Dense(128 => 128, relu))

    println("trainables1")
    @btime floss(trainables1($m))
    println("trainables2")
    @btime floss(trainables2($m))
    println("trainables3")
    @btime floss(trainables3($m))
    println()

    println("gradient trainables1")
    @btime gradient(m -> floss(trainables1(m)), $m)
    println("gradient trainables2")
    @btime gradient(m -> floss(trainables2(m)), $m)
    println("gradient trainables3")
    @btime gradient(m -> floss(trainables3(m)), $m)

    nothing
end

Zygote.refresh()
perf()

CarloLucibello · 2024-04-04T06:50:39Z

could anyone review?

darsnack

My only long term suggestion is to address FluxML/Functors.jl#81 and adjust fleaves to match the performance here. But these are implementation details that don't affect the API, and this seems ready as is.

docs/src/index.md

src/trainables.jl

darsnack · 2024-04-04T13:21:34Z

One API consideration before merging and releasing is whether this needs to be separate from #173. It's trivial to ignore the path if it's not relevant.

Also, while okay for Functors, I'm not a fan of duplicated *_with_path functions. Like @ToucheSir suggested, if we want two variants, then a keyword flag seems like a better API.

Co-authored-by: Kyle Daruwalla <[email protected]>

CarloLucibello · 2024-04-04T13:32:27Z

I kept #173 separate to simplify the review of this one. Let's continue the discussion there.

* trainables * trainables * cl/trainables * trainables * test second order derivatives * add doc section * fix test * Update src/trainables.jl

CarloLucibello added 5 commits April 2, 2024 13:49

trainables

43e51a2

trainables

a6e40a0

cl/trainables

53e0678

trainables

9131192

test second order derivatives

06f786c

CarloLucibello force-pushed the cl/trainables branch from 9242bb2 to 06f786c Compare April 2, 2024 11:50

CarloLucibello added 2 commits April 2, 2024 14:02

add doc section

faea10f

fix test

292a82d

CarloLucibello requested a review from mcabbott April 2, 2024 12:36

CarloLucibello mentioned this pull request Apr 3, 2024

add trainables_with_path #173

Closed

darsnack approved these changes Apr 4, 2024

View reviewed changes

docs/src/index.md Outdated Show resolved Hide resolved

src/trainables.jl Outdated Show resolved Hide resolved

CarloLucibello and others added 2 commits April 4, 2024 15:24

Update src/trainables.jl

2694e9e

Co-authored-by: Kyle Daruwalla <[email protected]>

Update docs/src/index.md

c42823d

Co-authored-by: Kyle Daruwalla <[email protected]>

CarloLucibello merged commit a87ffd5 into master Apr 4, 2024
4 of 5 checks passed

CarloLucibello mentioned this pull request Apr 5, 2024

add path option to trainables #174

Merged

mashu pushed a commit to mashu/Optimisers.jl that referenced this pull request Nov 14, 2024

add trainables (FluxML#171)

63d7ccf

* trainables * trainables * cl/trainables * trainables * test second order derivatives * add doc section * fix test * Update src/trainables.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `trainables` #171

add `trainables` #171

CarloLucibello commented Apr 1, 2024 •

edited

Loading

CarloLucibello commented Apr 1, 2024

CarloLucibello commented Apr 1, 2024

CarloLucibello commented Apr 4, 2024

darsnack left a comment

darsnack commented Apr 4, 2024

CarloLucibello commented Apr 4, 2024

add trainables #171

add trainables #171

Conversation

CarloLucibello commented Apr 1, 2024 • edited Loading

TODO

CarloLucibello commented Apr 1, 2024

CarloLucibello commented Apr 1, 2024

CarloLucibello commented Apr 4, 2024

darsnack left a comment

Choose a reason for hiding this comment

darsnack commented Apr 4, 2024

CarloLucibello commented Apr 4, 2024

add `trainables` #171

add `trainables` #171

CarloLucibello commented Apr 1, 2024 •

edited

Loading