Tied weights using Flux layers #1592

dfenn · 2021-05-07T17:25:36Z

I'm trying to build an autoencoder that uses both conv and dense layers, and I'd like to tie the weights. #488 demonstrates how to do this for dense layers by not using the Flux Dense type and instead using the encoder's weights directly.

Is there a way to accomplish something similar while still using Flux-defined layer types, such as Conv? I've tried manually setting the decoder parameters in the loss function; something like this:


mutable struct AE_tied
    encoder
    decoder

    weights_encoder
    weights_decoder
end

AE_tied(encoder, decoder) = AE_tied(encoder, decoder, params(encoder), params(decoder))

function (a::AE_tied)(x)
    x = a.encoder(x)
    a.weights_decoder[1] .= a.weights_encoder[1]
    a.decoder(x)
end

encoder = Conv((3,3), 1=>2, relu, pad=SamePad())
decoder = ConvTranspose((3,3), 2=>1, relu, pad=SamePad())

model = AE_tied(encoder, decoder)
model = cpu(model)

ps = Flux.params(model.encoder)
opt = ADAM(0.1)

function loss(x) 
    y = model(x)
    sum((y .- x) .^2) / length(x)
end

train_data = cpu(rand(5, 5, 1, 2))

for epoch in 1:1
    local trainLoss
    gs = Flux.gradient(ps) do
        trainLoss = loss(train_data)
        return trainLoss
    end
    Flux.Optimise.update!(opt, ps, gs)
    @show trainLoss
end

Running this gives ERROR: LoadError: Mutating arrays is not supported. It's the line a.weights_decoder[1] .= a.weights_encoder[1] that's the issue.

Am I going about this the wrong way, or is what I'm trying to do not supported? Thanks in advance for any help

The text was updated successfully, but these errors were encountered:

atiyo · 2021-05-09T21:04:02Z

Indeed mutating isn't supported by Zygote, which is used to calculate the gradients. It is supported in some other Julia AD packages which you might be able to use.

However, I don't believe the above snippet actually does tie the weights properly. E.g. the gradients of tied weights should be the same, but this won't be the case if you tie them by manually tweaking them to be equal.

With this in mind, my preferred solution would be to initialise the weights of the decoder to be a @view on the weights of the encoder.

I haven't actually checked to see whether this plays nicely with Flux, but maybe it's something to try.

dfenn · 2021-05-11T05:44:25Z

Thanks for you response. I was able to get it working using @views for the convolutional layers. However, the same approach isn't working for dense layers, where the weights matrix must be transposed:

encoder = Dense(5, 2)
@views decoder = Dense(transpose(encoder.weight), rand(5))

This gives the error

ERROR: LoadError: TypeError: in typeassert, expected Tuple{Transpose{Float32, Matrix{Float32}}, Transpose{Float32, Matrix{Float32}}, Vector{Float64}}, got a value of type Tuple{Matrix{Float32}, Matrix{Float32}, Vector{Float64}}
Stacktrace:
 [1] apply!(o::ADAM, x::Transpose{Float32, Matrix{Float32}}, Δ::Matrix{Float64})
   @ Flux.Optimise ~/.julia/packages/Flux/6BByF/src/optimise/optimisers.jl:175
 [2] update!(opt::ADAM, x::Transpose{Float32, Matrix{Float32}}, x̄::Matrix{Float64})
   @ Flux.Optimise ~/.julia/packages/Flux/6BByF/src/optimise/train.jl:23
 [3] update!(opt::ADAM, xs::Params, gs::Zygote.Grads)
   @ Flux.Optimise ~/.julia/packages/Flux/6BByF/src/optimise/train.jl:29

It looks like Flux is inferring the type as Transpose and then complaining when it receives a Matrix. I've tried using PermutedDimsArray instead, with similar results.

It's not clear to me how to address this. Any ideas?

darsnack · 2021-05-11T13:08:26Z

We should probably change that line in the Adam code to use Adapt.jl to get the correct type instead of hard-typing the return of get!.

DhairyaLGandhi · 2021-05-11T13:46:17Z

Probably better to incorporate directly in optimisers.jl

As long as we pass in the correct references we should be good. I don't think it needs to be addressed in the optimisers otherwise.

darsnack · 2021-05-11T13:51:23Z

I don't think we need the fix in Optimisers.jl because the state is initialized separately (and correctly). This appears to only be a bug for IdDict optimizers.

Agreed that we only need the references to be correct.

CarloLucibello · 2021-06-10T14:21:10Z

possibly related to FluxML/Zygote.jl#991 and #1613

We should probably change that line in the Adam code to use Adapt.jl to get the correct type instead of hard-typing the return of get!.

Even if use something like #1613 to adapt the types, that wouldn't still be entirely correct because we would be taking 2 steps of adam with separate gradients instead of a single step with the accumulated one

darsnack · 2021-06-23T00:43:54Z

taking 2 steps of adam with separate gradients instead of a single step with the accumulated one

Yeah, with ADAM this will certainly be wrong. Referencing FluxML/Zygote.jl#991 (comment), it's not two steps that's wrong. It's the momentum terms that will be incorrect leading to two steps not being equivalent to a single accumulated one. For simpler optimizers like Descent, this will be correct (assuming the gradients are correct which they are for explicit params).

mleprovost · 2023-10-31T19:00:35Z

Hello,

I wanted to follow-up on this issue. Is it resolved in the lastest version of Flux.jl?

mcabbott · 2023-10-31T19:47:35Z

On latest Flux, using new-style training with setup, something like dec = Dense(transpose(encoder.weight)) should just work. It will see through the transpose and notice that the same array appears twice.

(With old-style IdDict optimisers, I'm not sure.)

CarloLucibello mentioned this issue Jun 10, 2021

gradients with aliased variables FluxML/Zygote.jl#991

Open

ToucheSir mentioned this issue Jun 19, 2021

Frest start : Next steps FluxML/ONNX.jl#49

Open

darsnack mentioned this issue Jun 23, 2021

ERROR: setindex! not defined for Zygote.OneElement{...} #1626

Closed

ToucheSir added this to Explicit Parameter Transition Nov 7, 2021

ToucheSir moved this to Todo in Explicit Parameter Transition Nov 8, 2021

darsnack moved this from Todo to In Progress in Explicit Parameter Transition Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tied weights using Flux layers #1592

Tied weights using Flux layers #1592

dfenn commented May 7, 2021

atiyo commented May 9, 2021 •

edited

Loading

dfenn commented May 11, 2021

darsnack commented May 11, 2021

DhairyaLGandhi commented May 11, 2021 •

edited

Loading

darsnack commented May 11, 2021

CarloLucibello commented Jun 10, 2021 •

edited

Loading

darsnack commented Jun 23, 2021 •

edited

Loading

mleprovost commented Oct 31, 2023

mcabbott commented Oct 31, 2023

Tied weights using Flux layers #1592

Tied weights using Flux layers #1592

Comments

dfenn commented May 7, 2021

atiyo commented May 9, 2021 • edited Loading

dfenn commented May 11, 2021

darsnack commented May 11, 2021

DhairyaLGandhi commented May 11, 2021 • edited Loading

darsnack commented May 11, 2021

CarloLucibello commented Jun 10, 2021 • edited Loading

darsnack commented Jun 23, 2021 • edited Loading

mleprovost commented Oct 31, 2023

mcabbott commented Oct 31, 2023

atiyo commented May 9, 2021 •

edited

Loading

DhairyaLGandhi commented May 11, 2021 •

edited

Loading

CarloLucibello commented Jun 10, 2021 •

edited

Loading

darsnack commented Jun 23, 2021 •

edited

Loading