From e8b8e8863673940220bf99662d897df033491f50 Mon Sep 17 00:00:00 2001
From: Lei <lei.fang.87@gmail.com>
Date: Thu, 8 Sep 2022 15:02:10 +0100
Subject: [PATCH] Update section5_regressions.jl

---
 section5_regressions.jl | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/section5_regressions.jl b/section5_regressions.jl
index b621684..a92800a 100644
--- a/section5_regressions.jl
+++ b/section5_regressions.jl
@@ -69,7 +69,7 @@ By using matrix notation, the above model can be compactly written as:
 ```math
 p(\mathbf{y}|\mathbf{X}, \boldsymbol{\beta}, \sigma^2) = \mathcal{N}_N(\mathbf{y}; \beta_0 \mathbf{1}_N + \mathbf{X} \boldsymbol{\beta}_1, \sigma^2\mathbf{I}_N),
 ```
-where ``\mathbf{y} = [y_1, y_2,\ldots,y_N]^\top, \mathbf{X} = [\mathbf{x}_1, \mathbf{x}_1, \ldots, \mathbf{x}_1]^\top``, ``\mathbf{1}_N=[1, \ldots, 1]^\top`` is a ``N\times 1`` column vector of ones and ``\mathbf{I}_{N}`` is a ``N\times N`` identity matrix.
+where ``\mathbf{y} = [y_1, y_2,\ldots,y_N]^\top, \mathbf{X} = [\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n]^\top``, ``\mathbf{1}_N=[1, \ldots, 1]^\top`` is a ``N\times 1`` column vector of ones and ``\mathbf{I}_{N}`` is a ``N\times N`` identity matrix.
 
 The likelihood assumption is illustrated below for a simple linear regression model with a one-dimensional predictor, i.e. ``D=1``. Conditioning on where ``x_n`` is, ``y_n`` is Gaussian distributed with a mean determined by the regression line and an observation variance ``\sigma^2``.
 
@@ -415,13 +415,15 @@ This can be understood better to consider some de-generate examples. Assume ``D=
 
 
 ```math
-
-m_N = \frac{v_0^{-1}}{v_0^{-1} + \tilde{v}^{-1}}m_0 + \frac{\tilde{v}^{-1} }{v_0^{-1} + \tilde{v}^{-1}}\hat{\beta}_1, v_N = \frac{1}{ v_0^{-1} + \tilde{v}^{-1}},
+\begin{align}
+m_N &= \frac{v_0^{-1}}{v_0^{-1} + \tilde{v}^{-1}}m_0 + \frac{\tilde{v}^{-1} }{v_0^{-1} + \tilde{v}^{-1}}\hat{\beta}_1\\
+v_N &= \frac{1}{ v_0^{-1} + \tilde{v}^{-1}}
+\end{align}
 ```
 where ``\tilde{v}^{-1} = \sum_n x_n^2`` by definition. 
 
-Note that if we assume ``m_0=0``, and the prior precision ``v_0^{-1}`` gets large, ``v_0^{-1} \rightarrow \infty``, in other words, we strongly believe the slope is zero, the posterior mean ``m_N`` will get closer to zero: ``m_N\rightarrow m_0=0``.
-The posterior variance is reduced in comparison with the prior variance ``v_0``. It makes sense since the posterior update reduces the estimation uncertainty.
+Note that if we assume ``m_0=0``, and the prior precision ``v_0^{-1}`` gets large, say ``v_0^{-1} \rightarrow \infty`` (in other words, we strongly believe the slope is zero), the posterior mean ``m_N`` will get closer to zero: ``m_N\rightarrow m_0=0``.
+Also note the posterior variance ``v_N`` is reduced in comparison with the prior variance ``v_0``. It makes sense since the posterior update reduces the estimation uncertainty.
 """
 
 # ╔═╡ 59dd8a13-89c6-4ae9-8546-877bb7992570