-
-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull request #2007 causes Flux.params() calls to not get cached #2040
Comments
It's not so easy to guess what's gone wrong. I presume your model contains transpose or adjoint matrices as parameters, which [email protected] will recurse into, instead of regarding them as leaf nodes. Maybe that's where to look to construct a MWE. |
Here is an MWE. On my desktop, running it with v0.13.4 takes 2 seconds per epoch while running it with v0.13.5 takes a couple minutes per epoch.
|
I can confirm 0.13.5 is slower (around 2-3X) per step over 0.13.4. However, both do make use of the GPU on my machine and neither runs an entire epoch in only 2s! One suspicious find on 0.13.5 which may be contributing. When I
0.13.4 does not exhibit this continual compilation, but does still incur some compilation in steps 2-3:
|
Reduced the repeated compilation to taking the gradient of just the regularization term |
I can confirm that when I make no changes other than removing the regularization line in my original code, the performance is back to what it was in 0.13.4. Is this something that can be fixed in this repo, or does the issue lie somewhere else? Also, any workarounds that would avoid the constant recompilation while keeping the regularization while waiting for a fix? |
Unfortunately, the problem is present again in Flux 0.13.10, so we have to reopen this task. In the following MWE, The equivalent
Here is the output:
|
I was wondering if there any workarounds that one can do to have regularization in the loss and avoid this issue? Eg, some Zygote tricks in how the loss is constructed? It seems that the only solution right now is to pin to 0.13.4, right? |
The best and most future-proof solution is to use explicit params for the regularization term as shown above, but we don't currently have nice helper functionality for that. If you're using implicit params, you can call |
The issue to track for the helper is FluxML/Optimisers.jl#57. Until then, here is a snippet that should do that same for simple penalties like L2. using Flux
using Functors
penalty(x::AbstractArray) = sum(x.^2) # example penalty
# further down
grads = Flux.gradient(model) do m
loss = # ...
reg = Functors.fmap(penalty, m; exclude = Flux.trainable)
return loss + lambda * reg
end |
It really needs some trainablewalk. |
Now we can apply regularization using Closing as we don't support |
I'm really not sure where the breakage happens but I'm more than happy to run test if you need me to. I isolated the issue to #2007 with bisect.
Ever since I upgraded to 0.13.5, my variational convolutional autoencoder is not running on the gpu and does not display any error messages. I can see using
nvidia-smi
that things are properly being transferred to gpu memory, but when it comes to the actually computations, the gpu usage fluctuates between 0% and 2% (which I believe is the arrays being moved to and from main memory) instead of consistently going up to ~60% usage.I tried a non convolutional variational autoencoder with the rest being mostly the same that was working normally, and I also tried models with convolutional layers without functors that also seemed to compute on gpu, so I believe that the combination of a struct with convolutional layers that is tagged with @functor is what prevents the math from happening on the gpu.
I'll do my best to be responsive.
The text was updated successfully, but these errors were encountered: