Skip to content

Commit

Permalink
updates for Functors v0.5 (#2528)
Browse files Browse the repository at this point in the history
Co-authored-by: Michael Abbott [email protected]
  • Loading branch information
CarloLucibello authored Nov 24, 2024
1 parent 0a324f8 commit e2b3f06
Show file tree
Hide file tree
Showing 19 changed files with 82 additions and 134 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ jobs:
- uses: codecov/codecov-action@v5
if: matrix.version == '1' && matrix.os == 'ubuntu-latest'
with:
file: lcov.info
files: lcov.info

docs:
name: Documentation
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ See also [github's page](https://github.com/FluxML/Flux.jl/releases) for a compl
Now Flux re-exports the optimisers from Optimisers.jl. Most users will be uneffected by this change.
The module is still available for now, but will be removed in a future release.
* Most Flux layers will [re-use memory via `NNlib.bias_act!`](https://github.com/FluxML/Flux.jl/pull/2327), when possible.
* `Flux.params` has been deprecated. Use Zygote's explicit differentiation instead,
`gradient(m -> loss(m, x, y), model)`, or use `Flux.trainables(model)` to get the trainable parameters.
* Flux now requires Functors.jl v0.5. This new release of Functors assumes all types to be functors by default. Therefore, applying `@layer` or `@functor` to a type is no longer strictly necessary for Flux's models. However, it is still recommended to use `@layer Model` for additional functionality like pretty printing.

## v0.14.22
* Data movement between devices is now provided by [MLDataDevices.jl](https://github.com/LuxDL/MLDataDevices.jl).
Expand Down
5 changes: 0 additions & 5 deletions docs/src/guide/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,6 @@ Metal GPU acceleration is available on Apple Silicon hardware. For more details
In order to trigger GPU support in Flux, you need to call `using CUDA`, `using AMDGPU` or `using Metal`
in your code. Notice that for CUDA, explicitly loading also `cuDNN` is not required, but the package has to be installed in the environment.


!!! compat "Flux ≤ 0.13"
Old versions of Flux automatically installed CUDA.jl to provide GPU support. Starting from Flux v0.14, CUDA.jl is not a dependency anymore and has to be installed manually.


## Basic GPU Usage

Support for array operations on other hardware backends, like GPUs, is provided by external packages like [CUDA.jl](https://github.com/JuliaGPU/CUDA.jl), [AMDGPU.jl](https://github.com/JuliaGPU/AMDGPU.jl), and [Metal.jl](https://github.com/JuliaGPU/Metal.jl).
Expand Down
2 changes: 1 addition & 1 deletion docs/src/guide/models/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ m(5) # => 26

## Layer Helpers

There is still one problem with this `Affine` layer, that Flux does not know to look inside it. This means that [`Flux.train!`](@ref Flux.train!) won't see its parameters, nor will [`gpu`](@ref) be able to move them to your GPU. These features are enabled by the [`@layer`](@ref Flux.@layer) macro:
We can give our layer some additional functionality, like nice printing, using the [`@layer`](@ref Flux.@layer) macro:

```julia
Flux.@layer Affine
Expand Down
6 changes: 3 additions & 3 deletions docs/src/guide/models/custom_layers.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ function (m::CustomModel)(x)
return m.chain(x) + x
end

# Call @layer to allow for training. Described below in more detail.
# This is optional but recommended for pretty printing and other niceties
Flux.@layer CustomModel
```
Notice that we parameterized the type of the `chain` field. This is necessary for fast Julia code, so that that struct field can be given a concrete type. `Chain`s have a type parameter fully specifying the types of the layers they contain. By using a type parameter, we are freeing Julia to determine the correct concrete type, so that we do not need to specify the full, possibly quite long, type ourselves.
Expand Down Expand Up @@ -78,7 +78,7 @@ The exact same method of `trainable` can also be defined using the macro, for co
Flux.@layer Affine trainable=(W,)
```

There is a second, more severe, kind of restriction possible. This is not recommended, but is included here for completeness. Calling `Functors.@functor Affine (W,)` means that all no exploration of the model will ever visit the other fields: They will not be moved to the GPU by [`gpu`](@ref), and their precision will not be changed by `f32`. This requires the `struct` to have a corresponding constructor that accepts only `W` as an argument.
There is a second, more severe, kind of restriction possible. This is not recommended, but is included here for completeness. Calling `Functors.@functor Affine (W,)` means that no exploration of the model will ever visit the other fields: They will not be moved to the GPU by [`gpu`](@ref), and their precision will not be changed by `f32`. This requires the `struct` to have a corresponding constructor that accepts only `W` as an argument.

## Custom multiple input or output layer

Expand All @@ -87,7 +87,7 @@ Sometimes a model needs to receive several separate inputs at once or produce se
We could have a struct that stores the weights of along each path and implement the joining/splitting in the forward pass function. That would mean a new struct for each different block,
e.g. one would have a `TransformerBlock` struct for a transformer block, and a `ResNetBlock` struct for a ResNet block, each block being composed by smaller sub-blocks. This is often the simplest and cleanest way to implement complex models.

This guide instead will show you how to construct a high-level layer (like [`Chain`](@ref)) that is made of multiple sub-layers for each path.
This guide instead will show you how to construct a high-level layer (like [`Chain`](@ref)) that is made of multiple sub-layers for each path. It may be the case that using the layers described as follows makes the definition of your model harder to read and to change. In that case, consider using the simpler approach of defining a custom structure described above.

### Multiple inputs: a custom `Join` layer

Expand Down
37 changes: 23 additions & 14 deletions docs/src/guide/models/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,48 +5,53 @@ If you have used neural networks before, then this simple example might be helpf
If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page.

```julia
# This will prompt if neccessary to install everything, including CUDA:
# This will prompt if neccessary to install everything, including CUDA.
# For CUDA acceleration, also cuDNN.jl has to be installed in your environment.
using Flux, CUDA, Statistics, ProgressMeter

# Generate some data for the XOR problem: vectors of length 2, as columns of a matrix:
noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32}
truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)] # 1000-element Vector{Bool}

# Use this object to move data and model to the GPU, if available
device = gpu_device()

# Define our model, a multi-layer perceptron with one hidden layer of size 3:
model = Chain(
Dense(2 => 3, tanh), # activation function inside layer
Dense(2 => 3, tanh), # activation function inside layer
BatchNorm(3),
Dense(3 => 2)) |> gpu # move model to GPU, if available
Dense(3 => 2)) |> device # move model to GPU, if available

# The model encapsulates parameters, randomly initialised. Its initial output is:
out1 = model(noisy |> gpu) |> cpu # 2×1000 Matrix{Float32}
probs1 = softmax(out1) # normalise to get probabilities
out1 = model(noisy |> device) |> cpu # 2×1000 Matrix{Float32}
probs1 = softmax(out1) # normalise to get probabilities

# To train the model, we use batches of 64 samples, and one-hot encoding:
target = Flux.onehotbatch(truth, [true, false]) # 2×1000 OneHotMatrix
loader = Flux.DataLoader((noisy, target) |> gpu, batchsize=64, shuffle=true);
# 16-element DataLoader with first element: (2×64 Matrix{Float32}, 2×64 OneHotMatrix)
loader = Flux.DataLoader((noisy, target), batchsize=64, shuffle=true);

optim = Flux.setup(Flux.Adam(0.01), model) # will store optimiser momentum, etc.
opt_state = Flux.setup(Flux.Adam(0.01), model) # will store optimiser momentum, etc.

# Training loop, using the whole data set 1000 times:
losses = []
@showprogress for epoch in 1:1_000
for (x, y) in loader
x, y = device((x, y))
loss, grads = Flux.withgradient(model) do m
# Evaluate model and loss inside gradient context:
y_hat = m(x)
Flux.logitcrossentropy(y_hat, y)
end
Flux.update!(optim, model, grads[1])
Flux.update!(opt_state, model, grads[1])
push!(losses, loss) # logging, outside gradient context
end
end

optim # parameters, momenta and output have all changed
out2 = model(noisy |> gpu) |> cpu # first row is prob. of true, second row p(false)
probs2 = softmax(out2) # normalise to get probabilities
mean((probs2[1,:] .> 0.5) .== truth) # accuracy 94% so far!
opt_state # parameters, momenta and output have all changed

out2 = model(noisy |> device) |> cpu # first row is prob. of true, second row p(false)
probs2 = softmax(out2) # normalise to get probabilities
mean((probs2[1,:] .> 0.5) .== truth) # accuracy 94% so far!
```

![](../../assets/quickstart/oneminute.png)
Expand Down Expand Up @@ -95,9 +100,13 @@ Instead of calling [`gradient`](@ref Zygote.gradient) and [`update!`](@ref Flux.

```julia
for epoch in 1:1_000
Flux.train!(model, loader, optim) do m, x, y
Flux.train!(model, loader, opt_state) do m, x, y
x, y = device((x, y))
y_hat = m(x)
Flux.logitcrossentropy(y_hat, y)
end
end
```

* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers.
For more complex models, you can define a custom struct `MyModel` containing layers and arrays and implement the call operator `(::MyModel)(x) = ...` to define the forward pass. This is all it is needed for Flux to work. Marking the struct with [`Flux.@layer`](@ref) will add some more functionality, like pretty printing and the ability to mark some internal fields as trainable or not (also see [`trainable`](@ref Optimisers.trainable)).
5 changes: 4 additions & 1 deletion docs/src/reference/models/functors.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,17 @@ CollapsedDocStrings = true

# Recursive transformations from Functors.jl

Flux models are deeply nested structures, and [Functors.jl](https://github.com/FluxML/Functors.jl) provides tools needed to explore such objects, apply functions to the parameters they contain, and re-build them.
Flux models are deeply nested structures, and [Functors.jl](https://github.com/FluxML/Functors.jl) provides tools needed to explore such objects, apply functions to the parameters they contain (e.g. for moving them to gpu), and re-build them.

!!! compat "Flux ≤ 0.14"
All layers were previously defined with the `Functors.@functor` macro.
This still works, but it is recommended that you use the new [`Flux.@layer`](@ref Flux.@layer) macro instead.
Both allow [`Flux.setup`](@ref Flux.setup) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU, but [`Flux.@layer`](@ref Flux.@layer) also overloads printing,
and offers a way to define `trainable` at the same time.

!!! compat "Functors 0.5"
With Functors.jl v0.5, which is required by Flux v0.15 and later, every custom type is a functor by default. This means that applying `Flux.@layer` to a type is no longer strictly necessary, but it is still recommended for addictional features like pretty-printing and `trainable`.

`Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](@ref man-advanced) page covers the use cases of `Functors` in greater details.

```@docs
Expand Down
1 change: 0 additions & 1 deletion perf/recurrent.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
struct RNNWrapper{T}
rnn::T
end
Flux.@functor RNNWrapper

# Need to specialize for RNNWrapper.
fw(r::RNNWrapper, X::Vector{<:AbstractArray}) = begin
Expand Down
4 changes: 3 additions & 1 deletion src/Flux.jl
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,9 @@ include("train.jl")
using .Train
using .Train: setup

using Adapt, Functors, OneHotArrays
using Adapt, OneHotArrays
using Functors: Functors, fmap, fmapstructure

include("utils.jl")
include("functor.jl")

Expand Down
28 changes: 16 additions & 12 deletions src/deprecations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -64,17 +64,6 @@ const FluxMetalAdaptor = MetalDevice

######## v0.15 deprecations #########################

# Enable these when 0.16 is released, and delete const ClipGrad = Optimise.ClipValue etc:
# Base.@deprecate_binding Optimiser OptimiserChain
# Base.@deprecate_binding ClipValue ClipGrad

# train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
# """On Flux 0.16, `train!` no longer accepts implicit `Zygote.Params`.
# Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
# it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
# where `loss_mxy` accepts the model as its first argument.
# """
# ))

function reset!(x)
Base.depwarn("reset!(m) is deprecated. You can remove this call as it is no more needed.", :reset!)
Expand All @@ -87,7 +76,6 @@ function params!(p::Zygote.Params, x, seen = IdSet())
elseif x in seen
nothing
else
_check_new_macro(x) # complains if you used @functor not @layer
push!(seen, x)
for child in trainable(x)
params!(p, child, seen)
Expand Down Expand Up @@ -126,3 +114,19 @@ function Optimisers.update!(opt::Optimisers.AbstractRule, model::Chain, grad::Tu
`update!(state, model, grad)` needs `state = Flux.setup(opt, model)`.
""")
end


### v0.16 deprecations ####################


# Enable these when 0.16 is released, and delete const ClipGrad = Optimise.ClipValue etc:
# Base.@deprecate_binding Optimiser OptimiserChain
# Base.@deprecate_binding ClipValue ClipGrad

# train!(loss::Function, ps::Zygote.Params, data, opt) = throw(ArgumentError(
# """On Flux 0.16, `train!` no longer accepts implicit `Zygote.Params`.
# Instead of `train!(loss_xy, Flux.params(model), data, Adam())`
# it now needs `opt = Flux.setup(Adam(), model); train!(loss_mxy, model, data, opt)`
# where `loss_mxy` accepts the model as its first argument.
# """
# ))
26 changes: 5 additions & 21 deletions src/functor.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,3 @@
import Adapt: adapt, adapt_storage
using LinearAlgebra: Cholesky
using Zygote: IdSet
import Functors: Functors, @functor, functor, fmap, isleaf
using SparseArrays: AbstractSparseArray

"""
testmode!(model, [mode]) -> model
Expand Down Expand Up @@ -85,7 +79,7 @@ end
cpu(m)
Copies `m` onto the CPU, the opposite of [`gpu`](@ref).
Recurses into structs marked [`@functor`](@ref).
Recurses into structs (thanks to Functors.jl).
# Example
```julia-repl
Expand Down Expand Up @@ -125,16 +119,14 @@ end
Copies `m` to the current GPU device (using current GPU backend), if one is available.
If no GPU is available, it does nothing (but prints a warning the first time).
On arrays, this calls CUDA's `cu`, which also changes arrays
with Float64 elements to Float32 while copying them to the device (same for AMDGPU).
To act on arrays within a struct, the struct type must be marked with [`@functor`](@ref).
It recurses into structs according to Functors.jl.
Use [`cpu`](@ref) to copy back to ordinary `Array`s.
See also [`f32`](@ref) and [`f16`](@ref) to change element type only.
See the [CUDA.jl docs](https://juliagpu.github.io/CUDA.jl/stable/usage/multigpu/)
to help identify the current device.
This function is just defined for convenience around [`gpu_device`](@ref),
and is equivalent to `gpu_device()(m)`.
You may consider defining `device = gpu_device()` once and then using `device(m)` to move data.
# Example
```julia-repl
Expand All @@ -153,10 +145,6 @@ CUDA.CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
"""
gpu(x) = gpu_device()(x)

# TODO remove after https://github.com/LuxDL/Lux.jl/pull/1089
ChainRulesCore.@non_differentiable gpu_device()
ChainRulesCore.@non_differentiable gpu_device(::Any)

# Precision

struct FluxEltypeAdaptor{T} end
Expand Down Expand Up @@ -222,10 +210,6 @@ Chain(
"""
f16(m) = _paramtype(Float16, m)

# Functors for certain Julia data structures -- PIRACY, should move to Functors.jl
@functor Cholesky
trainable(c::Cholesky) = ()


"""
gpu(data::DataLoader)
Expand Down
Loading

0 comments on commit e2b3f06

Please sign in to comment.