Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradients of shared parameters do not behave as expected #420

Closed
arthur-bizzi opened this issue Oct 9, 2023 · 2 comments
Closed

Gradients of shared parameters do not behave as expected #420

arthur-bizzi opened this issue Oct 9, 2023 · 2 comments

Comments

@arthur-bizzi
Copy link

arthur-bizzi commented Oct 9, 2023

Hey. Please excuse the flurry of bug reports.

It seems that gradients with regards to shared parameters are not working correctly. This is most evident when working with invertible architectures.

Take the following example. We apply a simple coupling layer F and then apply its inverse B. If F and B have the same parameters, they cancel out and the result is just the input vector (to machine precision). This happens regardless of the specific parameters ps - which means that any gradient should be zero.

What happens instead is that the AD engine cannot tell that the parameters are tied and returns something else. For this simple example this gradient is in fact the same as for the non-tied parameters.

#LuxHelpers
using Lux, ComponentArrays, Random, Zygote
rng = Random.default_rng()

#Define a coupling Layer and its inverse:
struct LeapFrog{T} <: Lux.AbstractExplicitLayer
    sub_net::T
end 
(frog::LeapFrog)(x,ps,st) = (frog.sub_net(x[1],ps,st)[1]+x[2],x[1]),st
Lux.initialparameters(rng::AbstractRNG,frog::LeapFrog) = Lux.initialparameters(rng::AbstractRNG,frog.sub_net)
Lux.initialstates(rng::AbstractRNG,frog::LeapFrog) = Lux.initialstates(rng::AbstractRNG,frog.sub_net)

struct BackFrog{T} <: Lux.AbstractExplicitLayer
    sub_net::T
end 
(frog::BackFrog)(x,ps,st) = (x[2],x[1]-frog.sub_net(x[2],ps,st)[1]),st
Lux.initialparameters(rng::AbstractRNG,frog::BackFrog) = Lux.initialparameters(rng::AbstractRNG,frog.sub_net)
Lux.initialstates(rng::AbstractRNG,frog::BackFrog) = Lux.initialstates(rng::AbstractRNG,frog.sub_net)

#Setup a Chain that applies the layer and the inverse in sequence:
D = Dense(1=>1)
F = LeapFrog(D)
B = BackFrog(D)
C = Chain(;f1=F,b1=B)
ps, st = Lux.setup(rng,C)
ps_share = Lux.share_parameters(ps, (("f1","b1"),))

#For shared parameters, the Chain is just the identity:
v = ([1.],[1.])
C(v,ps,st) #not v
C(v,ps_share,st)# v

#Toy loss
toy_loss(P) = C(v,P,st)[1] |> sum |> sum
toy_loss(ps) #Random
toy_loss(ps_share) #2.0

#Take the gradient; it should be zero for the shared parameters
grad(p) = Zygote.gradient(toy_loss,p)
grad(ps) == grad(ps_share) #True

This doesn't seem to be an issue with Zygote either:

using ReverseDiff

psc = ps |> ComponentArray
psc_share = ps_share |> ComponentArray
grad_rev(p) = ReverseDiff.gradient(toy_loss,p)
grad_rev(psc) == grad_rev(psc_share)#True 

I'm aware that this is tagged as experimental. Still, it's a neural networks library. If something cannot be used with gradients, perhaps it shouldn't be exposed to users?

What are the plans with regards to shared_parameters? It would be of immense importance to my work; please let me know if there's something I could do to help.

@avik-pal
Copy link
Member

avik-pal commented Oct 9, 2023

If you use Optimisers.jl, https://fluxml.ai/Optimisers.jl/stable/#Tied-Parameters should know how to perform the updates

@avik-pal
Copy link
Member

Closing this since the behavior here is expected, and one is supposed to use Optimisers / a functors-based approach which accumulates the gradients exactly once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants