Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimiser state a not moving to GPU #179

Closed
vpuri3 opened this issue Oct 3, 2024 · 7 comments · Fixed by #180
Closed

Optimiser state a not moving to GPU #179

vpuri3 opened this issue Oct 3, 2024 · 7 comments · Fixed by #180

Comments

@vpuri3
Copy link
Contributor

vpuri3 commented Oct 3, 2024

julia> using CUDA, Optimisers

julia> opt_st = Optimisers.setup(Optimisers.Adam(), rand(2))
Leaf(Adam(0.001, (0.9, 0.999), 1.0e-8), ([0.0, 0.0], [0.0, 0.0], (0.9, 0.999)))

julia> cu(opt_st).state[1]
2-element Vector{Float64}:
 0.0
 0.0

julia> cu(opt_st.state)[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0
(NeuralROMs) pkg> st CUDA
Project NeuralROMs v0.0.1
Status `~/.julia/dev/NeuralROMs.jl/Project.toml`
  [052768ef] CUDA v5.5.2

(NeuralROMs) pkg> st Optimisers
Project NeuralROMs v0.0.1
Status `~/.julia/dev/NeuralROMs.jl/Project.toml`
  [3bd65402] Optimisers v0.3.3
@vpuri3
Copy link
Contributor Author

vpuri3 commented Oct 3, 2024

can be fixed with

julia> Adapt.@adapt_structure Optimisers.Leaf

julia> cu(opt_st).state[1]
2-element CuArray{Float32, 1, CUDA.DeviceMemory}:
 0.0
 0.0

@CarloLucibello
Copy link
Member

Since Leaf is a functor, it will move to GPU when using Flux.gpu or MLDataDevices.jl.

Also, if the input to setup is on GPU (which tipically is what you want), the state will be on GPU as well.

So maybe there is no need for Adapt?

@vpuri3
Copy link
Contributor Author

vpuri3 commented Oct 3, 2024

I routinely save the optimizer state to checkpoint files during training. So I need to move em back to the GPU when I restart training.

@ToucheSir
Copy link
Member

Yes, but that's already doable with Flux.gpu(opt_st) or gdev = MLDataDevices.gpu_device(); gdev(opt_st). Is there a reason you're not able to use either of those libraries and must use CUDA.cu directly?

@vpuri3
Copy link
Contributor Author

vpuri3 commented Oct 3, 2024

Thanks for the speedy reply @ToucheSir, my use case is below. I am using MLDataDevices.gpu_device something is still going off.

(from #180 (comment))

That is true, the MWE works with MLDataDevices. However, we still need Adapt functionality. Consider the case when Leaf is stored as part of a struct. Then using MLDataDevice.gpu_device doesn't move the state to the GPU even if we have Adapt.adapt_structure defined for the object.

using Optimisers, CUDA, LuxCUDA, MLDataDevices, Adapt

struct TrainState{Tp, To}
  p::Tp
  opt_st::To
end

Adapt.@adapt_structure TrainState

p = rand(2)
opt_st = Optimisers.setup(Optimisers.Adam(), p)
ts = TrainState(p, opt_st)
device = gpu_device()
device(ts).opt_st.state[1]

2-element Vector{Float64}:
 0.0
 0.0

@ToucheSir
Copy link
Member

Per the docs, you need to define @functor or @layer for TrainState to make either function work for it. Happy to take a docs PR which clarifies this.

@vpuri3
Copy link
Contributor Author

vpuri3 commented Oct 4, 2024

Thanks all, I appreciate your patience in explaining this to me. I directly defined MLDataDevices methods on TrainState.

function (dev::MLDataDevices.CPUDevice)(state::TrainState)
	TrainState(dev(state.p), dev(state.opt_st))
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants