diff --git a/docs/src/assets/zygote-crop.png b/docs/src/assets/zygote-crop.png
new file mode 100644
index 0000000000..ddc04b3d17
Binary files /dev/null and b/docs/src/assets/zygote-crop.png differ
diff --git a/docs/src/guide/models/basics.md b/docs/src/guide/models/basics.md
index 1fdb052789..272966ef09 100644
--- a/docs/src/guide/models/basics.md
+++ b/docs/src/guide/models/basics.md
@@ -188,7 +188,7 @@ For ordinary pure functions like `(x,y) -> (x*y)`, this `∂f(x,y)/∂f` would a
depends on `θ`.
```@raw html
-
+
```
Flux's [`gradient`](@ref) function by default calls a companion packages called [Zygote](https://github.com/FluxML/Zygote.jl).
@@ -327,7 +327,9 @@ grad = Flux.gradient(|>, [1f0], model1)[2]
This gradient is starting to be a complicated nested structure.
But it works just like before: `grad.outer.inner.W` corresponds to `model1.outer.inner.W`.
-### [Flux's layers](man-layers)
+```@raw html
+ Flux's layers
+```
Rather than define everything from scratch every time, Flux provides a library of
commonly used layers. The same model could be defined:
@@ -359,14 +361,14 @@ How does this `model2` differ from the `model1` we had before?
Calling [`Flux.@layer Layer`](@ref Flux.@layer) will add this, and some other niceties.
If what you need isn't covered by Flux's built-in layers, it's easy to write your own.
-There are more details [later](man-advanced), but the steps are invariably those shown for `struct Layer` above:
+There are more details [later](@ref man-advanced), but the steps are invariably those shown for `struct Layer` above:
1. Define a `struct` which will hold the parameters.
2. Make it callable, to define how it uses them to transform the input `x`
3. Define a constructor which initialises the parameters (if the default constructor doesn't do what you want).
4. Annotate with `@layer` to opt-in to pretty printing, and other enhacements.
```@raw html
-
+
```
To deal with such nested structures, Flux relies heavily on an associated package
@@ -399,7 +401,7 @@ of the output -- it must be a number, not a vector. Adjusting the parameters
to make this smaller won't lead us anywhere interesting. Instead, we should minimise
some *loss function* which compares the actual output to our desired output.
-Perhaps the simplest example is curve fitting. The [previous page](man-overview) fitted
+Perhaps the simplest example is curve fitting. The [previous page](@ref man-overview) fitted
a linear function to data. With out two-layer `model2`, we can fit a nonlinear function.
For example, let us use `f(x) = 2x - x^3` evaluated at some points `x in -2:0.1:2` as the data,
and adjust the parameters of `model2` from above so that its output is similar.
@@ -424,6 +426,6 @@ plot(x -> 2x-x^3, -2, 2, label="truth")
scatter!(x -> model2([x]), -2:0.1f0:2, label="fitted")
```
-If this general idea is unfamiliar, you may want the [tutorial on linear regression](man-linear-regression).
+If this general idea is unfamiliar, you may want the [tutorial on linear regression](@ref man-linear-regression).
-More detail about what exactly the function `train!` is doing, and how to use rules other than simple [`Descent`](@ref Optimisers.Descent), is what the next page in this guide is about: [training](man-training).
+More detail about what exactly the function `train!` is doing, and how to use rules other than simple [`Descent`](@ref Optimisers.Descent), is what the next page in this guide is about: [training](@ref man-training).
diff --git a/docs/src/reference/models/layers.md b/docs/src/reference/models/layers.md
index d7e67d3e3d..ae9232f5fb 100644
--- a/docs/src/reference/models/layers.md
+++ b/docs/src/reference/models/layers.md
@@ -1,4 +1,4 @@
-# [Built-in Layer Types]](@id man-layers)
+# [Built-in Layer Types](@id man-layers)
If you started at the beginning of the guide, then you have already met the
basic [`Dense`](@ref) layer, and seen [`Chain`](@ref) for combining layers.