diff --git a/docs/src/getting_started/fit_simulation.md b/docs/src/getting_started/fit_simulation.md
index 10ee7b58a08..7929c7115c0 100644
--- a/docs/src/getting_started/fit_simulation.md
+++ b/docs/src/getting_started/fit_simulation.md
@@ -46,18 +46,27 @@ Now, in [the first tutorial](@ref first_sim), we assumed:
 > Luckily, a local guide provided us with some parameters that seem to match the system!
 
 Sadly, magical nymphs do not always show up and give us parameters. Thus in this case,
-let's assume that we are just given data that is representative of the solution with
-``\alpha = 1.5``, ``\beta = 1.0``, ``\gamma = 3.0``, and ``\delta = 1.0``. This data
-is given over a time span of ``t_0 = 0`` to ``t_f = 10`` with data taken on both rabbits
-and wolves at every ``\Delta t = 1.`` Can we figure out what the parameter values should
-be directly from the data?
+we will need to use Optimization.jl to optimize the model parameters to best fit some
+experimental data.
+We are given experimentally observed data of both rabbit and wolf populations
+over a time span of ``t_0 = 0`` to ``t_f = 10`` at every ``\Delta t = 1``.
+Can we figure out what the parameter values should be directly from the data?
 
 ## Solution as Copy-Pastable Code
-
 ```@example
 using DifferentialEquations, Optimization, OptimizationPolyalgorithms, SciMLSensitivity
 using ForwardDiff, Plots
 
+# Define experimental data
+t_data = 0:10
+x_data = [1.000 2.773 6.773 0.971 1.886 6.101 1.398 1.335 4.353 3.247 1.034]
+y_data = [1.000 0.259 2.015 1.908 0.323 0.629 3.458 0.508 0.314 4.547 0.906]
+xy_data = vcat(x_data, y_data)
+
+# Plot the provided data
+scatter(t_data, xy_data', label=["x Data" "y Data"])
+
+# Setup the ODE function
 function lotka_volterra!(du, u, p, t)
     x, y = u
     α, β, δ, γ = p
@@ -72,39 +81,46 @@ u0 = [1.0, 1.0]
 tspan = (0.0, 10.0)
 
 # LV equation parameter. p = [α, β, δ, γ]
-p = [1.5, 1.0, 3.0, 1.0]
+pguess = [1.0, 1.2, 2.5, 1.2]
 
-# Setup the ODE problem, then solve
-prob = ODEProblem(lotka_volterra!, u0, tspan, p)
-datasol = solve(prob, saveat = 1)
-data = Array(datasol)
+# Set up the ODE problem with our guessed parameter values
+prob = ODEProblem(lotka_volterra!, u0, tspan, pguess)
 
-## Now do the optimization process
+# Solve the ODE problem with our guessed parameter values
+initial_sol = solve(prob, saveat = 1)
+
+# View the guessed model solution
+plt = plot(initial_sol, label = ["x Prediction" "y Prediction"])
+scatter!(plt, t_data, xy_data', label = ["x Data" "y Data"])
+
+# Define a loss metric function to be minimized
 function loss(newp)
     newprob = remake(prob, p = newp)
     sol = solve(newprob, saveat = 1)
-    loss = sum(abs2, sol .- data)
+    loss = sum(abs2, sol .- xy_data)
     return loss, sol
 end
 
-callback = function (p, l, sol)
+# Define a callback function to monitor optimization progress
+function callback(p, l, sol)
     display(l)
-    plt = plot(sol, ylim = (0, 6), label = "Current Prediction")
-    scatter!(plt, datasol, label = "Data")
+    plt = plot(sol, ylim = (0, 6), label = ["Current x Prediction" "Current y Prediction"])
+    scatter!(plt, t_data, xy_data', label = ["x Data" "y Data"])
     display(plt)
-    # Tell Optimization.solve to not halt the optimization. If return true, then
-    # optimization stops.
     return false
 end
 
-adtype = Optimization.AutoForwardDiff()
+# Set up the optimization problem with our loss function and initial guess
+adtype = AutoForwardDiff()
 pguess = [1.0, 1.2, 2.5, 1.2]
-optf = Optimization.OptimizationFunction((x, p) -> loss(x), adtype)
-optprob = Optimization.OptimizationProblem(optf, pguess)
-
-result_ode = Optimization.solve(optprob, PolyOpt(),
-                                callback = callback,
-                                maxiters = 200)
+optf = OptimizationFunction((x, _) -> loss(x), adtype)
+optprob = OptimizationProblem(optf, pguess)
+
+# Optimize the ODE parameters for best fit to our data
+pfinal = solve(optprob, PolyOpt(),
+               callback = callback,
+               maxiters = 200)
+α, β, γ, δ = round.(pfinal, digits=1)
 ```
 
 ## Step-by-Step Solution
@@ -132,31 +148,63 @@ using DifferentialEquations, Optimization, OptimizationPolyalgorithms, SciMLSens
 using ForwardDiff, Plots
 ```
 
-### Step 2: Generate the Training Data
+### Step 2: View the Training Data
+
+In our example,
+we are given observed values for `x` and `y` populations at eleven instances in time.
+Let's make that the training data for our Lotka-Volterra dynamical system model.
+
+```@example odefit
+# Define experimental data
+t_data = 0:10
+x_data = [1.000 2.773 6.773 0.971 1.886 6.101 1.398 1.335 4.353 3.247 1.034]
+y_data = [1.000 0.259 2.015 1.908 0.323 0.629 3.458 0.508 0.314 4.547 0.906]
+xy_data = vcat(x_data, y_data)
+
+# Plot the provided data
+scatter(t_data, xy_data', label=["x Data" "y Data"])
+```
+
+!!! note
 
-In our example, we assumed that we had data representative of the solution with
-``\alpha = 1.5``, ``\beta = 1.0``, ``\gamma = 3.0``, and ``\delta = 1.0``. Let's make that
-training data. The way we can do that is by defining the ODE with those parameters and
-simulating it. Unlike [the first tutorial](@ref first_sim), which used ModelingToolkit,
+    The `Array` `xy_data` above has been oriented with time instances as columns
+    so that it can be directly compared with an `ODESolution` object. (See
+    [Solution Handling](https://docs.sciml.ai/DiffEqDocs/stable/basics/solution/#solution)
+    for more information on accessing DifferentialEquation.jl solution data.)
+    However, plotting an `Array` with Plots.jl requires the variables to be columns
+    and the time instances to be rows.
+    Thus, whenever the experimental data is plotted,
+    the transpose `xy_data'` will be used.
+
+### Step 3: Set Up the ODE Model
+
+We know that our system will behave according to the Lotka-Volterra ODE model,
+so let's set up that model with an initial guess at the parameter values:
+`\alpha`, `\beta`, `\gamma`, and `\delta`.
+Unlike [the first tutorial](@ref first_sim), which used ModelingToolkit,
 let's demonstrate using [DifferentialEquations.jl](https://docs.sciml.ai/DiffEqDocs/stable/)
 to directly define the ODE for the numerical solvers.
 
 To do this, we define a vector-based mutating function that calculates the derivatives for
 our system. We will define our system as a vector `u = [x,y]`, and thus `u[1] = x` and
-`u[2] = y`. This means that we need to calculate the derivative as `du = [dx,dy]`. Our parameters
-will simply be the vector `p = [α, β, δ, γ]`. Writing down the Lotka-Volterra equations in the
+`u[2] = y`. This means that we need to calculate the derivative as `du = [dx,dy]`.
+Our parameters will simply be the vector `p = [α, β, δ, γ]`.
+Writing down the Lotka-Volterra equations in the
 DifferentialEquations.jl direct form thus looks like the following:
 
 ```@example odefit
 function lotka_volterra!(du, u, p, t)
     x, y = u
     α, β, δ, γ = p
-    du[1] = α * x - β * x * y
-    du[2] = -δ * y + γ * x * y
+    du[1] = dx = α * x - β * x * y
+    du[2] = dy = -δ * y + γ * x * y
 end
 ```
 
-Now we need to define the initial condition, time span, and parameter vector to simulate with.
+Now we need to define the initial condition, time span, and parameter vector
+in order to solve this differential equation.
+We do not currently know the parameter values,
+but we will guess some values to start with and optimize them later.
 Following the problem setup, this looks like:
 
 ```@example odefit
@@ -167,101 +215,186 @@ u0 = [1.0, 1.0]
 tspan = (0.0, 10.0)
 
 # LV equation parameter. p = [α, β, δ, γ]
-p = [1.5, 1.0, 3.0, 1.0]
+pguess = [1.0, 1.2, 2.5, 1.2]
 ```
 
-Now we bring these pieces all together to define the `ODEProblem` and solve it. Note that we solve
-this equation with the keyword argument `saveat = 1` so that it saves a point at every ``\Delta t = 1``.
+Now we bring these pieces all together to define the `ODEProblem` and solve it.
+Note that we solve this equation with the keyword argument `saveat = 1`
+so that it saves a point at every ``\Delta t = 1`` to match our experimental data.
 
 ```@example odefit
-# Setup the ODE problem, then solve
-prob = ODEProblem(lotka_volterra!, u0, tspan, p)
-datasol = solve(prob, saveat = 1)
-```
+# Set up the ODE problem with our guessed parameter values
+prob = ODEProblem(lotka_volterra!, u0, tspan, pguess)
 
-```@example odefit
-data = Array(datasol)
+# Solve the ODE problem with our guessed parameter values
+initial_sol = solve(prob, saveat = 1)
+
+# View the guessed model solution
+plt = plot(initial_sol, label = ["x Prediction" "y Prediction"])
+scatter!(plt, t_data, xy_data', label = ["x Data" "y Data"])
 ```
 
-!!! note    
+    Clearly the parameter values that we guessed are not correct to model this system.
+    However, we can use Optimization.jl together with DifferentialEquations.jl
+    to fit our parameters to our training data.
+
+!!! note
 
     For more details on using DifferentialEquations.jl, check out the
     [getting started with DifferentialEquations.jl tutorial](https://docs.sciml.ai/DiffEqDocs/stable/getting_started/).
 
-### Step 3: Set Up the Cost Function for Optimization
+### Step 4: Set Up the Loss Function for Optimization
+
+Now let's start the optimization process.
+First, let's define a loss function to be minimized.
+(It is also sometimes referred to as a cost function.)
+For our loss function, we want to take a set of parameters,
+create a new ODE which has everything the same except for the changed parameters,
+solve this ODE with new parameters, and compare the ODE solution against the provided data.
+In this case, the *loss* returned from the loss function is a quantification
+of the difference between the current solution and the desired solution.
+When this difference is minimized, our model prediction will closely approximate the observed system data.
 
-Now let's start the estimation process. First, let's define a loss function. For our loss function, we want to
-take a set of parameters, create a new ODE which has everything the same except for the changed parameters,
-solve this ODE with new parameters, and compare its predictions against the data. For this parameter changing,
+To change our parameter values,
 there is a useful functionality in the
 [SciML problems interface](https://docs.sciml.ai/DiffEqDocs/stable/basics/problem/#Modification-of-problem-types)
 called `remake` which creates a new version of an existing `SciMLProblem` with the aspect you want changed.
 For example, if we wanted to change the initial condition `u0` of our ODE, we could do `remake(prob, u0 = newu0)`
-For our case, we want to change around just the parameters, so we can do `remake(prob, p = newp)`
+For our case, we want to change around just the parameters, so we can do `remake(prob, p = newp)`. It is faster to `remake` an existing `SciMLProblem` than to create a new problem every iteration.
 
 !!! note
-    
+
     `remake` can change multiple items at once by passing more keyword arguments, i.e., `remake(prob, u0 = newu0, p = newp)`.
     This can be used to extend the example to simultaneously learn the initial conditions and parameters!
 
-Now use `remake` to build the cost function. After we solve the new problem, we will calculate the sum of squared errors
-as our metric. The sum of squares can be quickly written in Julia via `sum(abs2,x)`. Using this information, our loss
-looks like:
+Now use `remake` to build the loss function. After we solve the new problem,
+we will calculate the sum of squared errors as our loss metric.
+The sum of squares can be quickly written in Julia via `sum(abs2,x)`.
+Using this information, our loss function looks like:
 
 ```@example odefit
 function loss(newp)
     newprob = remake(prob, p = newp)
     sol = solve(newprob, saveat = 1)
-    loss = sum(abs2, sol .- data)
-    return loss, sol
+    l = sum(abs2, sol .- xy_data)
+    return l, sol
 end
 ```
 
-Notice that our loss function returns the loss value as the first return, but returns extra information (the solution at the
-new parameters) as extra return arguments. We will explain why this extra return information is helpful in the next section.
+Notice that our loss function returns the loss value as the first return,
+but returns extra information (the ODE solution with the new parameters)
+as an extra return argument.
+We will explain why this extra return information is helpful in the next section.
+
+### Step 5: Solve the Optimization Problem
 
-### Step 4: Solve the Optimization Problem
+This step will look very similar to [the first optimization tutorial](@ref first_opt),
+except now we have a new loss function `loss` which returns both the loss value
+and the associated ODE solution.
+(In the previous tutorial, `L` only returned the loss value.)
+The `Optimization.solve` function can accept an optional callback function
+to monitor the optimization process using extra arguments returned from `loss`.
 
-This step will look very similar to [the first optimization tutorial](@ref first_opt), except now we have a new
-cost function. Here we'll also define a callback to monitor the solution process, more details about callbacks in Optimization.jl can be found [here](https://docs.sciml.ai/Optimization/stable/API/solve/). 
-However, this time, our function returns two things. The callback syntax is always `(value being optimized, arguments of loss return)`
-and thus this time the callback is given `(p, l, sol)`. See, returning the solution along with the loss as part of the
-loss function is useful because we have access to it in the callback to do things like plot the current solution
-against the data! Let's do that in the following way:
+The callback syntax is always:
+
+```
+callback(
+    optimization variables,
+    the current loss value,
+    other arguments returned from the loss function, ...
+)
+```
+
+In this case, we will provide the callback the arguments `(p, l, sol)`,
+since it always takes the current state of the optimization first (`p`)
+then the returns from the loss function (`l, sol`).
+The return value of the callback function should default to `false`.
+`Optimization.solve` will halt if/when the callback function returns `true` instead.
+Typically the `return` statement would monitor the loss value
+and stop once some criteria is reached, e.g. `return loss < 0.0001`,
+but we will stop after a set number of iterations instead.
+More details about callbacks in Optimization.jl can be found
+[here](https://docs.sciml.ai/Optimization/stable/API/solve/).
 
 ```@example odefit
-callback = function (p, l, sol)
+function callback(p, l, sol)
     display(l)
-    plt = plot(sol, ylim = (0, 6), label = "Current Prediction")
-    scatter!(plt, datasol, label = "Data")
+    plt = plot(sol, ylim = (0, 6), label = ["Current x Prediction" "Current y Prediction"])
+    scatter!(plt, t_data, xy_data', label = ["x Data" "y Data"])
     display(plt)
-    # Tell Optimization.solve to not halt the optimization. If return true, then
-    # optimization stops.
     return false
 end
 ```
 
-Thus every step of the optimization will show us the loss and a plot of how the solution
-looks vs the data at our current parameters.
+With this callback function, every step of the optimization will display both the loss value and a plot of how the solution compares to the training data.
 
-Now, just like [the first optimization tutorial](@ref first_opt), we setup our optimization
-problem. To do this, we need to come up with a `pguess`, an initial condition for the parameters
-which is our best guess of the true parameters. For this, we will use `pguess = [1.0, 1.2, 2.5, 1.2]`.
+Now, just like [the first optimization tutorial](@ref first_opt),
+we set up our `OptimizationFunction` and `OptimizationProblem`,
+and then `solve` the `OptimizationProblem`.
+We will initialize the `OptimizationProblem` with the same `pguess` we used
+when setting up the ODE Model in Step 3.
+Observe how `Optimization.solve` brings the model closer to the experimental data as it iterates towards better ODE parameter values!
+
+Note that we are using the `PolyOpt()` solver choice here which is discussed https://docs.sciml.ai/Optimization/dev/optimization_packages/polyopt/ since parameter estimation of non-linear differential equations is generally a non-convex problem so we want to run a stochastic algorithm (Adam) to get close to the minimum and then finish off with a quasi-newton method (L-BFGS) to find the optima.
 Together, this looks like:
 
 ```@example odefit
-adtype = Optimization.AutoForwardDiff()
-optf = Optimization.OptimizationFunction((x, p) -> loss(x), adtype)
+# Set up the optimization problem with our loss function and initial guess
+adtype = AutoForwardDiff()
 pguess = [1.0, 1.2, 2.5, 1.2]
-optprob = Optimization.OptimizationProblem(optf, pguess)
+optf = OptimizationFunction((x, _) -> loss(x), adtype)
+optprob = OptimizationProblem(optf, pguess)
+
+# Optimize the ODE parameters for best fit to our data
+pfinal = solve(optprob,
+               PolyOpt(),
+               callback = callback,
+               maxiters = 200)
+α, β, γ, δ = round.(pfinal, digits=1)
 ```
 
-Now we solve the optimization problem:
+!!! note
 
-```@example odefit
-result_ode = Optimization.solve(optprob, PolyOpt(),
-                                callback = callback,
-                                maxiters = 200)
-```
+    When referencing the documentation for DifferentialEquations.jl and Optimization.jl
+    simultaneously, note that the variables `f`, `u`, and `p` will refer to different quantities.
+
+    DifferentialEquations.jl:
+
+    ```math
+    \frac{du}{dt} = f(u,p,t)
+    ```
+
+    - `f` in `ODEProblem` is the function defining the derivative `du` in the ODE.
+
+        Here: `lotka_volterra!`
+
+    - `u` in `ODEProblem` contains the state variables of `f`.
+
+        Here: `x` and `y`
+
+    - `p` in `ODEProblem` contains the parameter variables of `f`.
+
+        Here: `\alpha`, `\beta`, `\gamma`, and `\delta`
+
+    - `t` is the independent (time) variable.
+
+        Here: indirectly defined with `tspan` in `ODEProblem` and `saveat` in `solve`
+
+    Optimization.jl:
+
+    ```math
+    \min_{u} f(u,p)
+    ```
+
+    - `f` in `OptimizationProblem` is the function to minimize (optimize).
+
+        Here: the [anonymous function](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions) `(x, _) -> loss(x)`
+
+    - `u` in `OptimizationProblem` contains the state variables of `f` to be optimized.
+
+        Here: the ODE parameters `\alpha`, `\beta`, `\gamma`, and `\delta` stored in `p`
+
+    - `p` in `OptimizationProblem` contains any fixed
+    [hyperparameters](https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)) of `f`.
 
-and the answer from the optimization is our desired parameters.
+        Here: our `loss` function does not require any hyperparameters, so we pass `_` for this `p`.