-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tied weights using Flux layers #1592
Comments
Indeed mutating isn't supported by Zygote, which is used to calculate the gradients. It is supported in some other Julia AD packages which you might be able to use. However, I don't believe the above snippet actually does tie the weights properly. E.g. the gradients of tied weights should be the same, but this won't be the case if you tie them by manually tweaking them to be equal. With this in mind, my preferred solution would be to initialise the weights of the decoder to be a I haven't actually checked to see whether this plays nicely with Flux, but maybe it's something to try. |
Thanks for you response. I was able to get it working using
This gives the error
It looks like Flux is inferring the type as It's not clear to me how to address this. Any ideas? |
We should probably change that line in the Adam code to use Adapt.jl to get the correct type instead of hard-typing the return of |
Probably better to incorporate directly in optimisers.jl As long as we pass in the correct references we should be good. I don't think it needs to be addressed in the optimisers otherwise. |
I don't think we need the fix in Optimisers.jl because the state is initialized separately (and correctly). This appears to only be a bug for IdDict optimizers. Agreed that we only need the references to be correct. |
possibly related to FluxML/Zygote.jl#991 and #1613
Even if use something like #1613 to adapt the types, that wouldn't still be entirely correct because we would be taking 2 steps of adam with separate gradients instead of a single step with the accumulated one |
Yeah, with ADAM this will certainly be wrong. Referencing FluxML/Zygote.jl#991 (comment), it's not two steps that's wrong. It's the momentum terms that will be incorrect leading to two steps not being equivalent to a single accumulated one. For simpler optimizers like |
Hello, I wanted to follow-up on this issue. Is it resolved in the lastest version of Flux.jl? |
On latest Flux, using new-style training with (With old-style IdDict optimisers, I'm not sure.) |
I'm trying to build an autoencoder that uses both conv and dense layers, and I'd like to tie the weights. #488 demonstrates how to do this for dense layers by not using the Flux
Dense
type and instead using the encoder's weights directly.Is there a way to accomplish something similar while still using Flux-defined layer types, such as
Conv
? I've tried manually setting the decoder parameters in the loss function; something like this:Running this gives
ERROR: LoadError: Mutating arrays is not supported
. It's the linea.weights_decoder[1] .= a.weights_encoder[1]
that's the issue.Am I going about this the wrong way, or is what I'm trying to do not supported? Thanks in advance for any help
The text was updated successfully, but these errors were encountered: