Optimiser state a not moving to GPU #179

vpuri3 · 2024-10-03T16:14:43Z

julia> using CUDA, Optimisers

julia> opt_st = Optimisers.setup(Optimisers.Adam(), rand(2))
Leaf(Adam(0.001, (0.9, 0.999), 1.0e-8), ([0.0, 0.0], [0.0, 0.0], (0.9, 0.999)))

julia> cu(opt_st).state[1]
2-element Vector{Float64}:
 0.0
 0.0

julia> cu(opt_st.state)[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0

(NeuralROMs) pkg> st CUDA
Project NeuralROMs v0.0.1
Status `~/.julia/dev/NeuralROMs.jl/Project.toml`
  [052768ef] CUDA v5.5.2

(NeuralROMs) pkg> st Optimisers
Project NeuralROMs v0.0.1
Status `~/.julia/dev/NeuralROMs.jl/Project.toml`
  [3bd65402] Optimisers v0.3.3

The text was updated successfully, but these errors were encountered:

vpuri3 · 2024-10-03T16:20:01Z

can be fixed with

julia> Adapt.@adapt_structure Optimisers.Leaf

julia> cu(opt_st).state[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0

CarloLucibello · 2024-10-03T17:16:50Z

Since Leaf is a functor, it will move to GPU when using Flux.gpu or MLDataDevices.jl.

Also, if the input to setup is on GPU (which tipically is what you want), the state will be on GPU as well.

So maybe there is no need for Adapt?

vpuri3 · 2024-10-03T19:09:50Z

I routinely save the optimizer state to checkpoint files during training. So I need to move em back to the GPU when I restart training.

ToucheSir · 2024-10-03T22:40:14Z

Yes, but that's already doable with Flux.gpu(opt_st) or gdev = MLDataDevices.gpu_device(); gdev(opt_st). Is there a reason you're not able to use either of those libraries and must use CUDA.cu directly?

vpuri3 · 2024-10-03T22:59:19Z

Thanks for the speedy reply @ToucheSir, my use case is below. I am using MLDataDevices.gpu_device something is still going off.

(from #180 (comment))

That is true, the MWE works with MLDataDevices. However, we still need Adapt functionality. Consider the case when Leaf is stored as part of a struct. Then using MLDataDevice.gpu_device doesn't move the state to the GPU even if we have Adapt.adapt_structure defined for the object.

using Optimisers, CUDA, LuxCUDA, MLDataDevices, Adapt

struct TrainState{Tp, To}
  p::Tp
  opt_st::To
end

Adapt.@adapt_structure TrainState

p = rand(2)
opt_st = Optimisers.setup(Optimisers.Adam(), p)
ts = TrainState(p, opt_st)
device = gpu_device()
device(ts).opt_st.state[1]

2-element Vector{Float64}:
 0.0
 0.0

ToucheSir · 2024-10-04T04:53:03Z

Per the docs, you need to define @functor or @layer for TrainState to make either function work for it. Happy to take a docs PR which clarifies this.

vpuri3 · 2024-10-04T19:03:15Z

Thanks all, I appreciate your patience in explaining this to me. I directly defined MLDataDevices methods on TrainState.

function (dev::MLDataDevices.CPUDevice)(state::TrainState)
	TrainState(dev(state.p), dev(state.opt_st))
end

vpuri3 mentioned this issue Oct 3, 2024

Add Adapt.adapt_structure method for Optimisers.Leaf #180

Merged

2 tasks

mcabbott closed this as completed in #180 Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimiser state a not moving to GPU #179

Optimiser state a not moving to GPU #179

vpuri3 commented Oct 3, 2024

vpuri3 commented Oct 3, 2024 •

edited

Loading

CarloLucibello commented Oct 3, 2024

vpuri3 commented Oct 3, 2024

ToucheSir commented Oct 3, 2024

vpuri3 commented Oct 3, 2024

ToucheSir commented Oct 4, 2024

vpuri3 commented Oct 4, 2024

Optimiser state a not moving to GPU #179

Optimiser state a not moving to GPU #179

Comments

vpuri3 commented Oct 3, 2024

vpuri3 commented Oct 3, 2024 • edited Loading

CarloLucibello commented Oct 3, 2024

vpuri3 commented Oct 3, 2024

ToucheSir commented Oct 3, 2024

vpuri3 commented Oct 3, 2024

ToucheSir commented Oct 4, 2024

vpuri3 commented Oct 4, 2024

vpuri3 commented Oct 3, 2024 •

edited

Loading