updates for Functors v0.5 (#2528)

Co-authored-by: Michael Abbott [email protected]
FluxML · Nov 24, 2024 · e2b3f06 · e2b3f06
1 parent 0a324f8
commit e2b3f06
Show file tree

Hide file tree

Showing 19 changed files with 82 additions and 134 deletions.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -65,7 +65,7 @@ jobs:
       - uses: codecov/codecov-action@v5
         if: matrix.version == '1' && matrix.os == 'ubuntu-latest'
         with:
-          file: lcov.info
+          files: lcov.info
 
   docs:
     name: Documentation

diff --git a/NEWS.md b/NEWS.md
@@ -12,6 +12,9 @@ See also [github's page](https://github.com/FluxML/Flux.jl/releases) for a compl
   Now Flux re-exports the optimisers from Optimisers.jl. Most users will be uneffected by this change.
   The module is still available for now, but will be removed in a future release.
 * Most Flux layers will [re-use memory via `NNlib.bias_act!`](https://github.com/FluxML/Flux.jl/pull/2327), when possible.
+* `Flux.params` has been deprecated. Use Zygote's explicit differentiation instead, 
+`gradient(m -> loss(m, x, y), model)`, or use `Flux.trainables(model)` to get the trainable parameters.
+* Flux now requires Functors.jl v0.5. This new release of Functors assumes all types to be functors by default. Therefore, applying `@layer` or `@functor` to a type is no longer strictly necessary for Flux's models. However, it is still recommended to use `@layer Model` for additional functionality like pretty printing.
 
 ## v0.14.22
 * Data movement between devices is now provided by [MLDataDevices.jl](https://github.com/LuxDL/MLDataDevices.jl).

diff --git a/docs/src/guide/gpu.md b/docs/src/guide/gpu.md
@@ -12,11 +12,6 @@ Metal GPU acceleration is available on Apple Silicon hardware. For more details
 In order to trigger GPU support in Flux, you need to call `using CUDA`, `using AMDGPU` or `using Metal`
 in your code. Notice that for CUDA, explicitly loading also `cuDNN` is not required, but the package has to be installed in the environment. 
 
-
-!!! compat "Flux ≤ 0.13"
-    Old versions of Flux automatically installed CUDA.jl to provide GPU support. Starting from Flux v0.14, CUDA.jl is not a dependency anymore and has to be installed manually.
-
-
 ## Basic GPU Usage
 
 Support for array operations on other hardware backends, like GPUs, is provided by external packages like [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl), [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl), and [Metal.jl](https://github.com/JuliaGPU/Metal.jl).

diff --git a/docs/src/guide/models/basics.md b/docs/src/guide/models/basics.md
@@ -226,7 +226,7 @@ m(5) # => 26
 
 ## Layer Helpers
 
-There is still one problem with this `Affine` layer, that Flux does not know to look inside it. This means that [`Flux.train!`](@ref Flux.train!) won't see its parameters, nor will [`gpu`](@ref) be able to move them to your GPU. These features are enabled by the [`@layer`](@ref Flux.@layer) macro:
+We can give our layer some additional functionality, like nice printing, using the [`@layer`](@ref Flux.@layer) macro:
 
 ```julia
 Flux.@layer Affine

diff --git a/docs/src/guide/models/custom_layers.md b/docs/src/guide/models/custom_layers.md
@@ -18,7 +18,7 @@ function (m::CustomModel)(x)
   return m.chain(x) + x
 end
 
-# Call @layer to allow for training. Described below in more detail.
+# This is optional but recommended for pretty printing and other niceties
 Flux.@layer CustomModel
 ```
 Notice that we parameterized the type of the `chain` field. This is necessary for fast Julia code, so that that struct field can be given a concrete type. `Chain`s have a type parameter fully specifying the types of the layers they contain. By using a type parameter, we are freeing Julia to determine the correct concrete type, so that we do not need to specify the full, possibly quite long, type ourselves.
@@ -78,7 +78,7 @@ The exact same method of `trainable` can also be defined using the macro, for co
 Flux.@layer Affine trainable=(W,)
 ```
 
-There is a second, more severe, kind of restriction possible. This is not recommended, but is included here for completeness. Calling `Functors.@functor Affine (W,)` means that all no exploration of the model will ever visit the other fields: They will not be moved to the GPU by [`gpu`](@ref), and their precision will not be changed by `f32`. This requires the `struct` to have a corresponding constructor that accepts only `W` as an argument.
+There is a second, more severe, kind of restriction possible. This is not recommended, but is included here for completeness. Calling `Functors.@functor Affine (W,)` means that no exploration of the model will ever visit the other fields: They will not be moved to the GPU by [`gpu`](@ref), and their precision will not be changed by `f32`. This requires the `struct` to have a corresponding constructor that accepts only `W` as an argument.
 
 ## Custom multiple input or output layer
 
@@ -87,7 +87,7 @@ Sometimes a model needs to receive several separate inputs at once or produce se
 We could have a struct that stores the weights of along each path and implement the joining/splitting in the forward pass function. That would mean a new struct for each different block,
 e.g. one would have a `TransformerBlock` struct for a transformer block, and a `ResNetBlock` struct for a ResNet block, each block being composed by smaller sub-blocks. This is often the simplest and cleanest way to implement complex models.
 
-This guide instead will show you how to construct a high-level layer (like [`Chain`](@ref)) that is made of multiple sub-layers for each path.
+This guide instead will show you how to construct a high-level layer (like [`Chain`](@ref)) that is made of multiple sub-layers for each path. It may be the case that using the layers described as follows makes the definition of your model harder to read and to change. In that case, consider using the simpler approach of defining a custom structure described above.
 
 ### Multiple inputs: a custom `Join` layer
 

diff --git a/docs/src/guide/models/quickstart.md b/docs/src/guide/models/quickstart.md
@@ -5,48 +5,53 @@ If you have used neural networks before, then this simple example might be helpf
 If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.
 
 ```julia
-# This will prompt if neccessary to install everything, including CUDA:
+# This will prompt if neccessary to install everything, including CUDA.
+# For CUDA acceleration, also cuDNN.jl has to be installed in your environment.
 using Flux, CUDA, Statistics, ProgressMeter
 
 # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
 noisy = rand(Float32, 2, 1000)                                    # 2×1000 Matrix{Float32}
 truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)]   # 1000-element Vector{Bool}
 
+# Use this object to move data and model to the GPU, if available
+device = gpu_device()
+
 # Define our model, a multi-layer perceptron with one hidden layer of size 3:
 model = Chain(
-    Dense(2 => 3, tanh),   # activation function inside layer
+    Dense(2 => 3, tanh),      # activation function inside layer
     BatchNorm(3),
-    Dense(3 => 2)) |> gpu        # move model to GPU, if available
+    Dense(3 => 2)) |> device  # move model to GPU, if available
 
 # The model encapsulates parameters, randomly initialised. Its initial output is:
-out1 = model(noisy |> gpu) |> cpu                                 # 2×1000 Matrix{Float32}
-probs1 = softmax(out1)      # normalise to get probabilities
+out1 = model(noisy |> device) |> cpu     # 2×1000 Matrix{Float32}
+probs1 = softmax(out1)                   # normalise to get probabilities
 
 # To train the model, we use batches of 64 samples, and one-hot encoding:
 target = Flux.onehotbatch(truth, [true, false])                   # 2×1000 OneHotMatrix
-loader = Flux.DataLoader((noisy, target) |> gpu, batchsize=64, shuffle=true);
-# 16-element DataLoader with first element: (2×64 Matrix{Float32}, 2×64 OneHotMatrix)
+loader = Flux.DataLoader((noisy, target), batchsize=64, shuffle=true);
 
-optim = Flux.setup(Flux.Adam(0.01), model)  # will store optimiser momentum, etc.
+opt_state = Flux.setup(Flux.Adam(0.01), model)  # will store optimiser momentum, etc.
 
 # Training loop, using the whole data set 1000 times:
 losses = []
 @showprogress for epoch in 1:1_000
     for (x, y) in loader
+        x, y = device((x, y))
         loss, grads = Flux.withgradient(model) do m
             # Evaluate model and loss inside gradient context:
             y_hat = m(x)
             Flux.logitcrossentropy(y_hat, y)
         end
-        Flux.update!(optim, model, grads[1])
+        Flux.update!(opt_state, model, grads[1])
         push!(losses, loss)  # logging, outside gradient context
     end
 end
 
-optim # parameters, momenta and output have all changed
-out2 = model(noisy |> gpu) |> cpu  # first row is prob. of true, second row p(false)
-probs2 = softmax(out2)      # normalise to get probabilities
-mean((probs2[1,:] .> 0.5) .== truth)  # accuracy 94% so far!
+opt_state # parameters, momenta and output have all changed
+
+out2 = model(noisy |> device)  |> cpu   # first row is prob. of true, second row p(false)
+probs2 = softmax(out2)                  # normalise to get probabilities
+mean((probs2[1,:] .> 0.5) .== truth)    # accuracy 94% so far!
 ```
 
 ![](../../assets/quickstart/oneminute.png)
@@ -95,9 +100,13 @@ Instead of calling [`gradient`](@ref Zygote.gradient) and [`update!`](@ref Flux.
 
 ```julia
 for epoch in 1:1_000
-    Flux.train!(model, loader, optim) do m, x, y
+    Flux.train!(model, loader, opt_state) do m, x, y
+        x, y = device((x, y))
         y_hat = m(x)
         Flux.logitcrossentropy(y_hat, y)
     end
 end
 ```
+
+* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers. 
+For more complex models, you can define a custom struct `MyModel` containing layers and arrays and implement the call operator `(::MyModel)(x) = ...` to define the forward pass. This is all it is needed for Flux to work. Marking the struct with [`Flux.@layer`](@ref) will add some more functionality, like pretty printing and the ability to mark some internal fields as trainable or not (also see [`trainable`](@ref Optimisers.trainable)).
diff --git a/docs/src/reference/models/functors.md b/docs/src/reference/models/functors.md
@@ -4,14 +4,17 @@ CollapsedDocStrings = true
 
 # Recursive transformations from Functors.jl
 
-Flux models are deeply nested structures, and [Functors.jl](https://github.com/FluxML/Functors.jl) provides tools needed to explore such objects, apply functions to the parameters they contain, and re-build them.
+Flux models are deeply nested structures, and [Functors.jl](https://github.com/FluxML/Functors.jl) provides tools needed to explore such objects, apply functions to the parameters they contain (e.g. for moving them to gpu), and re-build them.
 
 !!! compat "Flux ≤ 0.14"
     All layers were previously defined with the `Functors.@functor` macro.
     This still works, but it is recommended that you use the new [`Flux.@layer`](@ref Flux.@layer) macro instead.
     Both allow [`Flux.setup`](@ref Flux.setup) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU, but [`Flux.@layer`](@ref Flux.@layer) also overloads printing,
     and offers a way to define `trainable` at the same time.
 
+!!! compat "Functors 0.5"
+    With Functors.jl v0.5, which is required by Flux v0.15 and later, every custom type is a functor by default. This means that applying `Flux.@layer` to a type is no longer strictly necessary, but it is still recommended for addictional features like pretty-printing and `trainable`.
+
 `Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](@ref man-advanced) page covers the use cases of `Functors` in greater details.
 
 ```@docs

diff --git a/perf/recurrent.jl b/perf/recurrent.jl
@@ -3,7 +3,6 @@
 struct RNNWrapper{T}
   rnn::T
 end
-Flux.@functor RNNWrapper
 
 # Need to specialize for RNNWrapper.
 fw(r::RNNWrapper, X::Vector{<:AbstractArray}) = begin

diff --git a/src/Flux.jl b/src/Flux.jl
@@ -92,7 +92,9 @@ include("train.jl")
 using .Train
 using .Train: setup
 
-using Adapt, Functors, OneHotArrays
+using Adapt, OneHotArrays
+using Functors: Functors, fmap, fmapstructure
+
 include("utils.jl")
 include("functor.jl")
 

diff --git a/src/deprecations.jl b/src/deprecations.jl
@@ -64,17 +64,6 @@ const FluxMetalAdaptor = MetalDevice
 
 ######## v0.15 deprecations #########################
 
-# Enable these when 0.16 is released, and delete const ClipGrad = Optimise.ClipValue etc: 
-# Base.@deprecate_binding Optimiser OptimiserChain
-# Base.@deprecate_binding ClipValue ClipGrad
-
-# train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
-#   """On Flux 0.16, `train!` no longer accepts implicit `Zygote.Params`.
-#   Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
-#   it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
-#   where `loss_mxy` accepts the model as its first argument.
-#   """
-# ))
 
 function reset!(x)
   Base.depwarn("reset!(m) is deprecated. You can remove this call as it is no more needed.", :reset!)
@@ -87,7 +76,6 @@ function params!(p::Zygote.Params, x, seen = IdSet())
   elseif x in seen
     nothing
   else
-    _check_new_macro(x)  # complains if you used @functor not @layer
     push!(seen, x)
     for child in trainable(x)
       params!(p, child, seen)
@@ -126,3 +114,19 @@ function Optimisers.update!(opt::Optimisers.AbstractRule, model::Chain, grad::Tu
      `update!(state, model, grad)` needs `state = Flux.setup(opt, model)`.
     """)
 end
+
+
+### v0.16 deprecations ####################
+
+
+# Enable these when 0.16 is released, and delete const ClipGrad = Optimise.ClipValue etc: 
+# Base.@deprecate_binding Optimiser OptimiserChain
+# Base.@deprecate_binding ClipValue ClipGrad
+
+# train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
+#   """On Flux 0.16, `train!` no longer accepts implicit `Zygote.Params`.
+#   Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
+#   it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
+#   where `loss_mxy` accepts the model as its first argument.
+#   """
+# ))
diff --git a/src/functor.jl b/src/functor.jl
@@ -1,9 +1,3 @@
-import Adapt: adapt, adapt_storage
-using  LinearAlgebra: Cholesky
-using Zygote: IdSet
-import Functors: Functors, @functor, functor, fmap, isleaf
-using SparseArrays: AbstractSparseArray
-
 """
     testmode!(model, [mode]) -> model
 
@@ -85,7 +79,7 @@ end
     cpu(m)
 
 Copies `m` onto the CPU, the opposite of [`gpu`](@ref).
-Recurses into structs marked [`@functor`](@ref).
+Recurses into structs (thanks to Functors.jl).
 
 # Example
 ```julia-repl
@@ -125,16 +119,14 @@ end
 
 Copies `m` to the current GPU device (using current GPU backend), if one is available.
 If no GPU is available, it does nothing (but prints a warning the first time).
-
-On arrays, this calls CUDA's `cu`, which also changes arrays
-with Float64 elements to Float32 while copying them to the device (same for AMDGPU).
-To act on arrays within a struct, the struct type must be marked with [`@functor`](@ref).
+It recurses into structs according to Functors.jl.
 
 Use [`cpu`](@ref) to copy back to ordinary `Array`s.
 See also [`f32`](@ref) and [`f16`](@ref) to change element type only.
 
-See the [CUDA.jl docs](https://juliagpu.github.io/CUDA.jl/stable/usage/multigpu/) 
-to help identify the current device.
+This function is just defined for convenience around [`gpu_device`](@ref), 
+and is equivalent to `gpu_device()(m)`.
+You may consider defining `device = gpu_device()` once and then using `device(m)` to move data.
 
 # Example
 ```julia-repl
@@ -153,10 +145,6 @@ CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
 """
 gpu(x) = gpu_device()(x)
 
-# TODO remove after https://github.com/LuxDL/Lux.jl/pull/1089
-ChainRulesCore.@non_differentiable gpu_device()
-ChainRulesCore.@non_differentiable gpu_device(::Any)
-
 # Precision
 
 struct FluxEltypeAdaptor{T} end
@@ -222,10 +210,6 @@ Chain(
 """
 f16(m) = _paramtype(Float16, m)
 
-# Functors for certain Julia data structures -- PIRACY, should move to Functors.jl
-@functor Cholesky
-trainable(c::Cholesky) = ()
-
 
 """
     gpu(data::DataLoader)