diff --git a/docs/src/index.md b/docs/src/index.md index a595d70..b775b2d 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,5 +1,23 @@ # Optimisers.jl +Optimisers.jl defines many standard gradient-based optimisation rules, and tools for applying them to deeply nested models. + +This was written as the new training system for [Flux.jl](https://github.com/FluxML/Flux.jl) neural networks, +and also used by [Lux.jl](https://github.com/LuxDL/Lux.jl). +But it can be used separately on any array, or anything else understood by [Functors.jl](https://github.com/FluxML/Functors.jl). + +## Installation + +In the Julia REPL, type +```julia +]add Optimisers +``` + +or +```julia-repl +julia> import Pkg; Pkg.add("Optimisers") +``` + ## An optimisation rule A new optimiser must overload two functions, [`apply!`](@ref Optimisers.apply!) and [`init`](@ref Optimisers.init). @@ -38,7 +56,6 @@ state for every trainable array. Then at each step, [`update`](@ref Optimisers.u to adjust the model: ```julia - using Flux, Metalhead, Zygote, Optimisers model = Metalhead.ResNet(18) |> gpu # define a model to train @@ -54,7 +71,6 @@ end; state_tree, model = Optimisers.update(state_tree, model, ∇model); @show sum(model(image)); # reduced - ``` Notice that a completely new instance of the model is returned. Internally, this @@ -91,7 +107,6 @@ Beware that it has nothing to do with Zygote's notion of "explicit" gradients. identical trees of nested `NamedTuple`s.) ```julia - using Lux, Boltz, Zygote, Optimisers lux_model, params, lux_state = Boltz.resnet(:resnet18) |> gpu; # define and initialise model @@ -113,7 +128,6 @@ opt_state, params = Optimisers.update!(opt_state, params, ∇params); y, lux_state = Lux.apply(lux_model, images, params, lux_state); @show sum(y); # now reduced - ``` Besides the parameters stored in `params` and gradually optimised, any other model state @@ -297,7 +311,7 @@ similarly to what [`destructure`](@ref Optimisers.destructure) does but without concatenating the arrays into a flat vector. This is done by [`trainables`](@ref Optimisers.trainables), which returns a list of arrays: -```julia +```julia-repl julia> using Flux, Optimisers julia> model = Chain(Dense(2 => 3, tanh), BatchNorm(3), Dense(3 => 2)); diff --git a/src/destructure.jl b/src/destructure.jl index a628452..c9b0d16 100644 --- a/src/destructure.jl +++ b/src/destructure.jl @@ -38,7 +38,7 @@ This is what [`destructure`](@ref Optimisers.destructure) returns, and `re(p)` w new parameters from vector `p`. If the model is callable, then `re(x, p) == re(p)(x)`. # Example -```julia +```julia-repl julia> using Flux, Optimisers julia> _, re = destructure(Dense([1 2; 3 4], [0, 0], sigmoid)) diff --git a/src/trainables.jl b/src/trainables.jl index 370de0f..838186b 100644 --- a/src/trainables.jl +++ b/src/trainables.jl @@ -32,10 +32,10 @@ julia> trainables(x) 1-element Vector{AbstractArray}: [1.0, 2.0, 3.0] - julia> x = MyLayer((a=[1.0,2.0], b=[3.0]), [4.0,5.0,6.0]); +julia> x = MyLayer((a=[1.0,2.0], b=[3.0]), [4.0,5.0,6.0]); - julia> trainables(x) # collects nested parameters - 2-element Vector{AbstractArray}: +julia> trainables(x) # collects nested parameters +2-element Vector{AbstractArray}: [1.0, 2.0] [3.0] ```