Skip to content

Commit

Permalink
Print channel dimensions of Dense like those of Conv (#1658)
Browse files Browse the repository at this point in the history
* print channel dims of Dense like Conv, and accept as input

* do the same for Bilinear

* fix tests

* fix tests

* docstring

* change a few more

* update

* docs

* rm circular ref

* fixup

* news + fixes
  • Loading branch information
mcabbott authored Feb 19, 2022
1 parent b35b23b commit f49e81e
Show file tree
Hide file tree
Showing 17 changed files with 142 additions and 130 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ been removed in favour of MLDatasets.jl.
* `flatten` is not exported anymore due to clash with Iterators.flatten.
* Remove Juno.jl progress bar support as it is now obsolete.
* `Dropout` gained improved compatibility with Int and Complex arrays and is now twice-differentiable.
* Notation `Dense(2 => 3, σ)` for channels matches `Conv`; the equivalent `Dense(2, 3, σ)` still works.
* Many utily functions and the `DataLoader` are [now provided by MLUtils.jl](https://github.com/FluxML/Flux.jl/pull/1874).
* The DataLoader is now compatible with generic dataset types implementing `MLUtils.numobs` and `MLUtils.getobs`.
* Added [truncated normal initialisation](https://github.com/FluxML/Flux.jl/pull/1877) of weights.
Expand Down
8 changes: 4 additions & 4 deletions docs/src/gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,12 @@ Note that we convert both the parameters (`W`, `b`) and the data set (`x`, `y`)
If you define a structured model, like a `Dense` layer or `Chain`, you just need to convert the internal parameters. Flux provides `fmap`, which allows you to alter all parameters of a model at once.

```julia
d = Dense(10, 5, σ)
d = Dense(10 => 5, σ)
d = fmap(cu, d)
d.weight # CuArray
d(cu(rand(10))) # CuArray output

m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
m = Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)
m = fmap(cu, m)
d(cu(rand(10)))
```
Expand All @@ -54,8 +54,8 @@ As a convenience, Flux provides the `gpu` function to convert models and data to
```julia
julia> using Flux, CUDA

julia> m = Dense(10,5) |> gpu
Dense(10, 5)
julia> m = Dense(10, 5) |> gpu
Dense(10 => 5)

julia> x = rand(10) |> gpu
10-element CuArray{Float32,1}:
Expand Down
29 changes: 15 additions & 14 deletions docs/src/models/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,10 @@ this using the slicing features `Chain` provides:

```julia
m = Chain(
Dense(784, 64, relu),
Dense(64, 64, relu),
Dense(32, 10)
)
Dense(784 => 64, relu),
Dense(64 => 64, relu),
Dense(32 => 10)
);

ps = Flux.params(m[3:end])
```
Expand Down Expand Up @@ -142,10 +142,11 @@ Lastly, we can test our new layer. Thanks to the proper abstractions in Julia, o
```julia
model = Chain(
Join(vcat,
Chain(Dense(1, 5),Dense(5, 1)), # branch 1
Dense(1, 2), # branch 2
Dense(1, 1)), # branch 3
Dense(4, 1)
Chain(Dense(1 => 5, relu), Dense(5 => 1)), # branch 1
Dense(1 => 2), # branch 2
Dense(1 => 1) # branch 3
),
Dense(4 => 1)
) |> gpu

xs = map(gpu, (rand(1), rand(1), rand(1)))
Expand All @@ -164,11 +165,11 @@ Join(combine, paths...) = Join(combine, paths)
# use vararg/tuple version of Parallel forward pass
model = Chain(
Join(vcat,
Chain(Dense(1, 5),Dense(5, 1)),
Dense(1, 2),
Dense(1, 1)
Chain(Dense(1 => 5, relu), Dense(5 => 1)),
Dense(1 => 2),
Dense(1 => 1)
),
Dense(4, 1)
Dense(4 => 1)
) |> gpu

xs = map(gpu, (rand(1), rand(1), rand(1)))
Expand Down Expand Up @@ -201,8 +202,8 @@ Flux.@functor Split
Now we can test to see that our `Split` does indeed produce multiple outputs.
```julia
model = Chain(
Dense(10, 5),
Split(Dense(5, 1),Dense(5, 3),Dense(5, 2))
Dense(10 => 5),
Split(Dense(5 => 1, tanh), Dense(5 => 3, tanh), Dense(5 => 2))
) |> gpu

model(gpu(rand(10)))
Expand Down
12 changes: 6 additions & 6 deletions docs/src/models/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,14 +158,14 @@ a(rand(10)) # => 5-element vector

Congratulations! You just built the `Dense` layer that comes with Flux. Flux has many interesting layers available, but they're all things you could have built yourself very easily.

(There is one small difference with `Dense` – for convenience it also takes an activation function, like `Dense(10, 5, σ)`.)
(There is one small difference with `Dense` – for convenience it also takes an activation function, like `Dense(10 => 5, σ)`.)

## Stacking It Up

It's pretty common to write models that look something like:

```julia
layer1 = Dense(10, 5, σ)
layer1 = Dense(10 => 5, σ)
# ...
model(x) = layer3(layer2(layer1(x)))
```
Expand All @@ -175,7 +175,7 @@ For long chains, it might be a bit more intuitive to have a list of layers, like
```julia
using Flux

layers = [Dense(10, 5, σ), Dense(5, 2), softmax]
layers = [Dense(10 => 5, σ), Dense(5 => 2), softmax]

model(x) = foldl((x, m) -> m(x), layers, init = x)

Expand All @@ -186,8 +186,8 @@ Handily, this is also provided for in Flux:

```julia
model2 = Chain(
Dense(10, 5, σ),
Dense(5, 2),
Dense(10 => 5, σ),
Dense(5 => 2),
softmax)

model2(rand(10)) # => 2-element vector
Expand All @@ -198,7 +198,7 @@ This quickly starts to look like a high-level deep learning library; yet you can
A nice property of this approach is that because "models" are just functions (possibly with trainable parameters), you can also see this as simple function composition.

```julia
m = Dense(5, 2) Dense(10, 5, σ)
m = Dense(5 => 2) Dense(10 => 5, σ)

m(rand(10))
```
Expand Down
10 changes: 5 additions & 5 deletions docs/src/models/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ Normally, your training and test data come from real world observations, but thi
Now, build a model to make predictions with `1` input and `1` output:

```julia
julia> model = Dense(1, 1)
Dense(1, 1)
julia> model = Dense(1 => 1)
Dense(1 => 1)

julia> model.weight
1×1 Matrix{Float32}:
Expand All @@ -58,10 +58,10 @@ julia> model.bias
Under the hood, a dense layer is a struct with fields `weight` and `bias`. `weight` represents a weights' matrix and `bias` represents a bias vector. There's another way to think about a model. In Flux, *models are conceptually predictive functions*:

```julia
julia> predict = Dense(1, 1)
julia> predict = Dense(1 => 1)
```

`Dense(1, 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.
`Dense(1 => 1)` also implements the function `σ(Wx+b)` where `W` and `b` are the weights and biases. `σ` is an activation function (more on activations later). Our model has one weight and one bias, but typical models will have many more. Think of weights and biases as knobs and levers Flux can use to tune predictions. Activation functions are transformations that tailor models to your needs.

This model will already make predictions, though not accurate ones yet:

Expand Down Expand Up @@ -185,7 +185,7 @@ The predictions are good. Here's how we got there.

First, we gathered real-world data into the variables `x_train`, `y_train`, `x_test`, and `y_test`. The `x_*` data defines inputs, and the `y_*` data defines outputs. The `*_train` data is for training the model, and the `*_test` data is for verifying the model. Our data was based on the function `4x + 2`.

Then, we built a single input, single output predictive model, `predict = Dense(1, 1)`. The initial predictions weren't accurate, because we had not trained the model yet.
Then, we built a single input, single output predictive model, `predict = Dense(1 => 1)`. The initial predictions weren't accurate, because we had not trained the model yet.

After building the model, we trained it with `train!(loss, parameters, data, opt)`. The loss function is first, followed by the `parameters` holding the weights and biases of the model, the training data, and the `Descent` optimizer provided by Flux. We ran the training step once, and observed that the parameters changed and the loss went down. Then, we ran the `train!` many times to finish the training process.

Expand Down
2 changes: 1 addition & 1 deletion docs/src/models/recurrence.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Equivalent to the `RNN` stateful constructor, `LSTM` and `GRU` are also availabl
Using these tools, we can now build the model shown in the above diagram with:

```julia
m = Chain(RNN(2, 5), Dense(5, 1))
m = Chain(RNN(2, 5), Dense(5 => 1))
```
In this example, each output has only one component.

Expand Down
12 changes: 6 additions & 6 deletions docs/src/models/regularisation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ For example, say we have a simple regression.
```julia
using Flux
using Flux.Losses: logitcrossentropy
m = Dense(10, 5)
m = Dense(10 => 5)
loss(x, y) = logitcrossentropy(m(x), y)
```

Expand Down Expand Up @@ -39,9 +39,9 @@ Here's a larger example with a multi-layer perceptron.

```julia
m = Chain(
Dense(28^2, 128, relu),
Dense(128, 32, relu),
Dense(32, 10))
Dense(28^2 => 128, relu),
Dense(128 => 32, relu),
Dense(32 => 10))

sqnorm(x) = sum(abs2, x)

Expand All @@ -55,8 +55,8 @@ One can also easily add per-layer regularisation via the `activations` function:
```julia
julia> using Flux: activations

julia> c = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
julia> c = Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)
Chain(Dense(10 => 5, σ), Dense(5 => 2), softmax)

julia> activations(c, rand(10))
3-element Array{Any,1}:
Expand Down
14 changes: 7 additions & 7 deletions docs/src/saving.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ julia> using Flux

julia> model = Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
Chain(
Dense(10, 5, relu), # 55 parameters
Dense(5, 2), # 12 parameters
Dense(10 => 5, relu), # 55 parameters
Dense(5 => 2), # 12 parameters
NNlib.softmax,
) # Total: 4 arrays, 67 parameters, 524 bytes.

Expand All @@ -32,8 +32,8 @@ julia> @load "mymodel.bson" model

julia> model
Chain(
Dense(10, 5, relu), # 55 parameters
Dense(5, 2), # 12 parameters
Dense(10 => 5, relu), # 55 parameters
Dense(5 => 2), # 12 parameters
NNlib.softmax,
) # Total: 4 arrays, 67 parameters, 524 bytes.

Expand All @@ -59,7 +59,7 @@ model parameters.
```Julia
julia> using Flux

julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
julia> model = Chain(Dense(10 => 5,relu),Dense(5 => 2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)

julia> weights = Flux.params(model);
Expand All @@ -74,7 +74,7 @@ You can easily load parameters back into a model with `Flux.loadparams!`.
```julia
julia> using Flux

julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
julia> model = Chain(Dense(10 => 5,relu),Dense(5 => 2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)

julia> using BSON: @load
Expand All @@ -94,7 +94,7 @@ In longer training runs it's a good idea to periodically save your model, so tha
using Flux: throttle
using BSON: @save

m = Chain(Dense(10,5,relu),Dense(5,2),softmax)
m = Chain(Dense(10 => 5, relu), Dense(5 => 2), softmax)

evalcb = throttle(30) do
# Show loss
Expand Down
4 changes: 2 additions & 2 deletions docs/src/training/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ We can also define an objective in terms of some model:

```julia
m = Chain(
Dense(784, 32, σ),
Dense(32, 10), softmax)
Dense(784 => 32, σ),
Dense(32 => 10), softmax)

loss(x, y) = Flux.Losses.mse(m(x), y)
ps = Flux.params(m)
Expand Down
2 changes: 1 addition & 1 deletion docs/src/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ function make_model(width, height, inchannels, nclasses;

# the input dimension to Dense is programatically calculated from
# width, height, and nchannels
return Chain(conv_layers..., Dense(prod(conv_outsize), nclasses))
return Chain(conv_layers..., Dense(prod(conv_outsize) => nclasses))
end
```

Expand Down
10 changes: 10 additions & 0 deletions src/deprecations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,14 @@ ones32(::Type, dims...) = throw(ArgumentError("Flux.ones32 is always Float32, us
zeros32(::Type, dims...) = throw(ArgumentError("Flux.zeros32 is always Float32, use Base.zeros to specify the element type"))

# v0.13 deprecations

@deprecate frequencies(xs) group_counts(xs)

# Channel notation: Changed to match Conv, but very softly deprecated!
# Perhaps change to @deprecate for v0.14, but there is no plan to remove these.
Dense(in::Integer, out::Integer, σ = identity; kw...) =
Dense(in => out, σ; kw...)
Bilinear(in1::Integer, in2::Integer, out::Integer, σ = identity; kw...) =
Bilinear((in1, in2) => out, σ; kw...)
Embedding(in::Integer, out::Integer; kw...) = Embedding(in => out; kw...)

Loading

0 comments on commit f49e81e

Please sign in to comment.