Moving @compact
to Flux?
#12
Replies: 6 comments 68 replies
-
Hey @MilesCranmer, please bear with us since this is literally the first time we need to consider how to do such a move. I'm not sure how everyone feels about it, so ping @FluxML/committers to get some input. If you want to discuss this in a sync fashion, we also have a ML call this Friday (see the Julia events calendar)—let me know if you have time and are interested, and I'll add it to the agenda. |
Beta Was this translation helpful? Give feedback.
-
I have strong reservations about copying this across. It seems like a first draft, and maybe in its present state very useful for some task / some ways of thinking. This is exactly what Fluxperimental is for. It could be merged here, and available as a registered package, without waiting to figure out the ideal thing. (Nor worrying about growing Flux's docs to have multiple ways of doing the same thing. Let alone worrying about testing & documenting as many edge cases as possible.) But the possible space of slightly similar designs is large. Are we sure there isn't some way to have 90% the benefit (or 200%) for 5% the code / complexity? E.g. if we kill What I think would be very helpful is examples of actual use. The documentation just re-makes things we can already do; we don't want to replace |
Beta Was this translation helpful? Give feedback.
-
I don't want to derail things with new feature ideas as I think current
Here's an option for getting around it: Rewrite struct CompactLayer{S,F,NT1<:NamedTuple,NT2<:NamedTuple}
symbol::Val{S}
fun::F
name::Union{String,Nothing}
strings::NTuple{3,String}
setup_strings::NT1
variables::NT2
end
CompactLayer(f::Function, name::Union{String,Nothing}, str::Tuple, setup_str::NamedTuple, symb=:Default; kw...) = CompactLayer(Val(symb), f, name, str, setup_str, NamedTuple(kw)) which is by default, just This would be used like: @compact(Linear, W=randn(5, 5), b=randn(5)) do x
W * x .+ b
end so that you can dispatch on specific layers using their symbol: function something(m::CompactLayer{:Linear})
...
end which would be stable, as it is insensitive to the specific function used in Edit: made a PR #14 |
Beta Was this translation helpful? Give feedback.
-
Here's a made-up example of
julia> using Fluxperimental: @compact; using Flux
julia> mylayer(n) = @compact( # this layer constructor is called mylayer
m = 2n,
tmp = randn32(m, n), # initialisation... these are not legal keyword arguments
mat = tmp ./= sum(tmp; dims=1) # ... and need not be Zygote-friendly
) do x
y = mat * x
y[1:n, :] .+ relu(y[n+1:end, :]) # forward pass
end
mylayer (generic function with 1 method)
julia> m = Chain(mylayer(3), Dense(3 => 1)) # notice that this stores tmp, and prints the init code
Chain(
@compact(
m = 2n,
tmp = randn32(m, n), # 18 parameters
mat = tmp ./= sum(tmp; dims = 1), # 18 parameters
) do x
y = mat * x
y[1:n, :] .+ relu(y[n + 1:end, :])
end,
Dense(3 => 1), # 4 parameters
) # Total: 3 trainable arrays, 22 parameters,
# plus 1 non-trainable, 18 parameters, summarysize 754 bytes.
julia> m[1].variables
(m = 6, tmp = Float32[0.65733784 0.332404 0.06522033; 0.40533376 -0.18564595 0.38826385; … ; 0.17651373 0.089108355 0.22100908; -0.08086931 1.1042991 -0.32503128], mat = Float32[0.65733784 0.332404 0.06522033; 0.40533376 -0.18564595 0.38826385; … ; 0.17651373 0.089108355 0.22100908; -0.08086931 1.1042991 -0.32503128]) (Is there really a non-trainable parameter array?) We got to this design starting from one which stored a The downsides are that
julia> layerfactory(f; layers...) = Base.Fix1(f, NamedTuple(layers));
julia> function mylayer1(n)
(_, m, _) = n, 2n, 3n # useless destructuring, would be illegal in macro
tmp = randn32(m, n) # tmp will be discarded after this runs
mat = tmp ./ sum(tmp; dims=1)
return layerfactory(; mat) do store, x
y = store.mat * x # forward pass needs to qualify fields
y[1:n, :] .+ relu(y[n+1:end, :])
end
end;
julia> m1 = Chain(mylayer1(3), Dense(3 => 1)) # clearly the printing of Fix1 isn't ideal
Chain(
Fix1(
var"#37#38"{Int64, Matrix{Float32}}(3, Float32[0.16561371 0.6836739 -19.871439; 0.30575347 0.19557928 -14.132152; … ; 0.2924494 -0.20139198 36.675316; 2.0378335 -0.033085346 -5.4660473]),
(mat = Float32[0.16561371 0.6836739 -19.871439; 0.30575347 0.19557928 -14.132152; … ; 0.2924494 -0.20139198 36.675316; 2.0378335 -0.033085346 -5.4660473],), # 18 parameters
),
Dense(3 => 1), # 4 parameters
) # Total: 3 arrays, 22 parameters, 352 bytes.
julia> layerfactory(a=Dense(1=>2), b=Dense(1=>2)) do nt, x # printing is better here, not ideal
nt.a(x) + nt.b(x)
end |> Chain
Chain(
Fix1(
var"#43#44"(),
NamedTuple(
Dense(1 => 2), # 4 parameters
Dense(1 => 2), # 4 parameters
),
),
) # Total: 4 arrays, 8 parameters, 352 bytes. Can the problems with the 1-line way be addressed by something simpler? I keep asking for examples because it's not clear to me whether this kind of layer is intended to be in-scope, or not. If not, then maybe init steps like this ought to be forbdden? Maybe the contents must always be other layers, never arrays? |
Beta Was this translation helpful? Give feedback.
-
Ping on this. Would be really nice to have! |
Beta Was this translation helpful? Give feedback.
-
Airing it out for any comments before I do it: I hope to replace the recursive addition of the (self, x) -> let a = self.a, W = self.W, ... begin
do x -> ... end # this is the user written function
end I think this should get all the same functionality. And means that minus the printing logic, Let me know if this makes sense! |
Beta Was this translation helpful? Give feedback.
-
Hi all,
I was wondering when
@compact
could be moved to Flux.jl?My experience with it has been great thus far. Every single time I make a neural net in Julia I use a mix of
Chain
for simple linear networks and@compact
for sticking operations and parameters together.I’ve shown a few people it when introducing them them to Flux.jl and they also find it very intuitive. Would be great to include in Flux.jl to help accelerate user growth!
Cheers,
Miles
Beta Was this translation helpful? Give feedback.
All reactions