From 5370fff1048356bd4889e021ec4d86b49759fb32 Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Tue, 3 Dec 2024 11:24:08 -0500 Subject: [PATCH 1/5] tweak quickstart --- docs/src/guide/models/quickstart.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/docs/src/guide/models/quickstart.md b/docs/src/guide/models/quickstart.md index a0c92e0ef3..665381122d 100644 --- a/docs/src/guide/models/quickstart.md +++ b/docs/src/guide/models/quickstart.md @@ -5,22 +5,20 @@ If you have used neural networks before, then this simple example might be helpf If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) page. ```julia -# This will prompt if neccessary to install everything, including CUDA. -# For CUDA acceleration, also cuDNN.jl has to be installed in your environment. +# Install everything, including CUDA, and load packages: +using Pkg; Pkg.add(["Flux", "CUDA", "cuDNN", "ProgressMeter"]) using Flux, CUDA, Statistics, ProgressMeter +device = gpu_device() # function to move data and model to the GPU # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix: noisy = rand(Float32, 2, 1000) # 2×1000 Matrix{Float32} truth = [xor(col[1]>0.5, col[2]>0.5) for col in eachcol(noisy)] # 1000-element Vector{Bool} -# Use this object to move data and model to the GPU, if available -device = gpu_device() - # Define our model, a multi-layer perceptron with one hidden layer of size 3: model = Chain( Dense(2 => 3, tanh), # activation function inside layer BatchNorm(3), - Dense(3 => 2)) |> device # move model to GPU, if available + Dense(3 => 2)) |> device # move model to GPU, if one is available # The model encapsulates parameters, randomly initialised. Its initial output is: out1 = model(noisy |> device) |> cpu # 2×1000 Matrix{Float32} @@ -35,8 +33,9 @@ opt_state = Flux.setup(Flux.Adam(0.01), model) # will store optimiser momentum, # Training loop, using the whole data set 1000 times: losses = [] @showprogress for epoch in 1:1_000 - for (x, y) in loader - x, y = device((x, y)) + for xy_cpu in loader + # Unpack batch of data, and move to GPU: + x, y = xy_cpu |> device loss, grads = Flux.withgradient(model) do m # Evaluate model and loss inside gradient context: y_hat = m(x) @@ -100,8 +99,7 @@ Instead of calling [`gradient`](@ref Zygote.gradient) and [`update!`](@ref Flux. ```julia for epoch in 1:1_000 - Flux.train!(model, loader, opt_state) do m, x, y - x, y = device((x, y)) + Flux.train!(model, loader |> device, opt_state) do m, x, y y_hat = m(x) Flux.logitcrossentropy(y_hat, y) end From c76571e2046711ae350243f561e0f28a0df34a23 Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Tue, 3 Dec 2024 11:35:56 -0500 Subject: [PATCH 2/5] avoid confusing line `model(noisy |> gpu) |> cpu` --- docs/src/guide/models/quickstart.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/src/guide/models/quickstart.md b/docs/src/guide/models/quickstart.md index 665381122d..1a8e4ca256 100644 --- a/docs/src/guide/models/quickstart.md +++ b/docs/src/guide/models/quickstart.md @@ -21,8 +21,8 @@ model = Chain( Dense(3 => 2)) |> device # move model to GPU, if one is available # The model encapsulates parameters, randomly initialised. Its initial output is: -out1 = model(noisy |> device) |> cpu # 2×1000 Matrix{Float32} -probs1 = softmax(out1) # normalise to get probabilities +out1 = model(noisy |> device) # 2×1000 Matrix{Float32}, or CuArray{Float32} +probs1 = softmax(out1) |> cpu # normalise to get probabilities (and move off GPU) # To train the model, we use batches of 64 samples, and one-hot encoding: target = Flux.onehotbatch(truth, [true, false]) # 2×1000 OneHotMatrix @@ -48,9 +48,9 @@ end opt_state # parameters, momenta and output have all changed -out2 = model(noisy |> device) |> cpu # first row is prob. of true, second row p(false) -probs2 = softmax(out2) # normalise to get probabilities -mean((probs2[1,:] .> 0.5) .== truth) # accuracy 94% so far! +out2 = model(noisy |> device) # first row is prob. of true, second row p(false) +probs2 = softmax(out2) |> cpu # normalise to get probabilities +mean((probs2[1,:] .> 0.5) .== truth) # accuracy 94% so far! ``` ![](../../assets/quickstart/oneminute.png) From d8c3a9d9fbab1f5093d48f31fcce0e773178de57 Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Tue, 3 Dec 2024 23:12:28 -0500 Subject: [PATCH 3/5] doc ref Flux.gradient --- docs/src/guide/models/quickstart.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/guide/models/quickstart.md b/docs/src/guide/models/quickstart.md index 1a8e4ca256..1ea8912ef4 100644 --- a/docs/src/guide/models/quickstart.md +++ b/docs/src/guide/models/quickstart.md @@ -95,7 +95,7 @@ Some things to notice in this example are: * The `do` block creates an anonymous function, as the first argument of `gradient`. Anything executed within this is differentiated. -Instead of calling [`gradient`](@ref Zygote.gradient) and [`update!`](@ref Flux.update!) separately, there is a convenience function [`train!`](@ref Flux.train!). If we didn't want anything extra (like logging the loss), we could replace the training loop with the following: +Instead of calling [`gradient`](@ref Flux.gradient) and [`update!`](@ref Flux.update!) separately, there is a convenience function [`train!`](@ref Flux.train!). If we didn't want anything extra (like logging the loss), we could replace the training loop with the following: ```julia for epoch in 1:1_000 From 3976afd619c94591cd0f76d365e4007e021361cb Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Tue, 3 Dec 2024 23:13:10 -0500 Subject: [PATCH 4/5] moving data to GPU --- docs/src/guide/models/quickstart.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/src/guide/models/quickstart.md b/docs/src/guide/models/quickstart.md index 1ea8912ef4..790b6aafe7 100644 --- a/docs/src/guide/models/quickstart.md +++ b/docs/src/guide/models/quickstart.md @@ -106,5 +106,7 @@ for epoch in 1:1_000 end ``` -* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers. +* Notice that the full dataset `noisy` lives on the CPU, and is moved to the GPU one batch at a time, by `xy_cpu |> device`. This is generally what you want for large datasets. Calling `loader |> device` similarly modifies the `DataLoader` to move one batch at a time. + +* In our simple example, we conveniently created the model has a [`Chain`](@ref Flux.Chain) of layers. For more complex models, you can define a custom struct `MyModel` containing layers and arrays and implement the call operator `(::MyModel)(x) = ...` to define the forward pass. This is all it is needed for Flux to work. Marking the struct with [`Flux.@layer`](@ref) will add some more functionality, like pretty printing and the ability to mark some internal fields as trainable or not (also see [`trainable`](@ref Optimisers.trainable)). From 234295eea559b84071a89a2c7841fc619fbeb1e4 Mon Sep 17 00:00:00 2001 From: Michael Abbott <32575566+mcabbott@users.noreply.github.com> Date: Wed, 4 Dec 2024 22:20:20 -0500 Subject: [PATCH 5/5] Update docs/src/guide/models/quickstart.md --- docs/src/guide/models/quickstart.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/src/guide/models/quickstart.md b/docs/src/guide/models/quickstart.md index 790b6aafe7..42e04b0f52 100644 --- a/docs/src/guide/models/quickstart.md +++ b/docs/src/guide/models/quickstart.md @@ -7,7 +7,8 @@ If you haven't, then you might prefer the [Fitting a Straight Line](overview.md) ```julia # Install everything, including CUDA, and load packages: using Pkg; Pkg.add(["Flux", "CUDA", "cuDNN", "ProgressMeter"]) -using Flux, CUDA, Statistics, ProgressMeter +using Flux, Statistics, ProgressMeter +using CUDA # optional device = gpu_device() # function to move data and model to the GPU # Generate some data for the XOR problem: vectors of length 2, as columns of a matrix: