Skip to content

Commit

Permalink
Update chapter5.md to v2
Browse files Browse the repository at this point in the history
  • Loading branch information
Sm1les authored Oct 28, 2024
1 parent 87114ff commit bce7cab
Showing 1 changed file with 0 additions and 60 deletions.
60 changes: 0 additions & 60 deletions docs/chapter5/chapter5.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,31 +19,25 @@


$$
y=f\left(\sum\limits_{i=1}^{n}w_ix_i-\theta\right)=f(\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x}-\theta)
$$


其中,$\boldsymbol{x}\in\mathbb{R}^n$,为样本的特征向量,是感知机模型的输入;$\boldsymbol{w},\theta$是感知机模型的参数,$\boldsymbol{w}\in\mathbb{R}^n$,为权重,$\theta$为阈值。假定$f$为阶跃函数,那么感知机模型的公式可进一步表示为 **(用$\varepsilon(\cdot)$代表阶跃函数)**


$$
y=\varepsilon(\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x}-\theta)=\left\{\begin{array}{rcl}
1,& {\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x} -\theta\geqslant 0};\\
0,& {\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x} -\theta < 0}.\\
\end{array} \right.
$$

由于$n$维空间中的超平面方程为


$$
w_1x_1+w_2x_2+\cdots+w_nx_n+b =\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x} +b=0
$$


Expand All @@ -53,19 +47,15 @@ $$


$$
T=\{(\boldsymbol{x}_1,y_1),(\boldsymbol{x}_2,y_2),\cdots,(\boldsymbol{x}_N,y_N)\}
$$


其中$\boldsymbol{x}_i\in\mathbb{R}^n,y_i\in\{0,1\},i=1,2,\cdots,N$。如果存在某个超平面


$$
\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x}+b=0
$$


Expand All @@ -75,30 +65,24 @@ $$


$$
\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x}-\theta=0
$$


假设此时误分类样本集合为$M\subseteq T$,对任意一个误分类样本$(\boldsymbol{x},y)\in M$来说,当$\boldsymbol{w}^\mathrm{T}\boldsymbol{x}-\theta\geqslant0$时,模型输出值为$\hat{y}=1$,样本真实标记为$y=0$;反之,当$\boldsymbol{w}^\mathrm{T}\boldsymbol{x}-\theta<0$时,模型输出值为$\hat{y}=0$,样本真实标记为$y=1$。综合两种情形可知,以下公式恒成立:


$$
(\hat{y}-y)\left(\boldsymbol{w}^\mathrm{T}\boldsymbol{x}-\theta\right)\geqslant0
$$


所以,给定数据集$T$,其损失函数可以定义为


$$
L(\boldsymbol{w},\theta)=\sum_{\boldsymbol{x}\in M}(\hat{y}-y)
\left(\boldsymbol{w}^\mathrm{T}\boldsymbol{x}-\theta\right)
$$


Expand All @@ -108,44 +92,36 @@ $$


$$
T=\{(\boldsymbol{x}_1,y_1),(\boldsymbol{x}_2,y_2),\cdots,(\boldsymbol{x}_N,y_N)\}
$$


其中$\boldsymbol{x}_i \in \mathbb{R}^n,y_i \in\{0,1\}$,求参数$\boldsymbol{w},\theta$,使其为极小化损失函数的解:


$$
\min\limits_{\boldsymbol{w},\theta}L(\boldsymbol{w},\theta)=\min\limits_{\boldsymbol{w},\theta}\sum_{\boldsymbol{x_i}\in M}(\hat{y}_i-y_i)(\boldsymbol{w}^\mathrm{T}\boldsymbol{x}_i-\theta)
$$


其中$M\subseteq T$为误分类样本集合。若将阈值$\theta$看作一个固定输入为$-1$的"哑节点",即


$$
-\theta=-1\cdot w_{n+1}=x_{n+1}\cdot w_{n+1}
$$


那么$\boldsymbol{w}^\mathrm{T}\boldsymbol{x}_i-\theta$可化简为


$$
\begin{aligned}
\boldsymbol{w}^\mathrm{T}\boldsymbol{x_i}-\theta&=\sum
\limits_{j=1}^n w_jx_j+x_{n+1}\cdot w_{n+1}\\
&=\sum\limits_{j=1}^{n+1}w_jx_j\\
&=\boldsymbol{w}^{\mathrm{T}}\boldsymbol{x_i}
\end{aligned}
$$


Expand All @@ -154,19 +130,15 @@ $$


$$
\min\limits_{\boldsymbol{w}}L(\boldsymbol{w})=\min\limits_{\boldsymbol{w}}\sum_{\boldsymbol{x_i}\in M}(\hat{y}_i-y_i)\boldsymbol{w}^\mathrm{T}\boldsymbol{x_i}
$$


假设误分类样本集合$M$固定,那么可以求得损失函数$L(\boldsymbol{w})$的梯度


$$
\nabla_{\boldsymbol{w}}L(\boldsymbol{w})=\sum_{\boldsymbol{x_i}\in M}(\hat{y}_i-y_i)\boldsymbol{x_i}
$$


Expand All @@ -175,18 +147,14 @@ $$


$$
\boldsymbol w \leftarrow \boldsymbol w+\Delta \boldsymbol w
$$




$$
\Delta \boldsymbol w=-\eta(\hat{y}_i-y_i)\boldsymbol x_i=\eta(y_i-\hat{y}_i)\boldsymbol x_i
$$


Expand All @@ -200,9 +168,7 @@ $$


$$
(x_1,x_2)\rightarrow h_1=\varepsilon(x_1-x_2-0.5),h_2=\varepsilon(x_2-x_1-0.5)\rightarrow y=\varepsilon(h_1+h_2-0.5)
$$


Expand All @@ -219,16 +185,13 @@ $$
因为

$$
\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}
$$



$$
\begin{aligned}
\cfrac{\partial E_k}{\partial \theta_j} &= \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot\cfrac{\partial \hat{y}_j^k}{\partial \theta_j} \\
&= \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot\cfrac{\partial [f(\beta_j-\theta_j)]}{\partial \theta_j} \\
Expand All @@ -240,16 +203,13 @@ $$
&=(y_j^k-\hat{y}_j^k)\hat{y}_j^k\left(1-\hat{y}_j^k\right) \\
&= g_j
\end{aligned}
$$

所以


$$
\Delta \theta_j = -\eta \cfrac{\partial E_k}{\partial \theta_j}=-\eta g_j
$$


Expand All @@ -259,16 +219,13 @@ $$
因为

$$
\Delta v_{ih} = -\eta \cfrac{\partial E_k}{\partial v_{ih}}
$$



$$
\begin{aligned}
\cfrac{\partial E_k}{\partial v_{ih}} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot \cfrac{\partial \alpha_h}{\partial v_{ih}} \\
&= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \alpha_h} \cdot x_i \\
Expand All @@ -279,16 +236,13 @@ $$
&= -b_h(1-b_h) \cdot \sum_{j=1}^{l} g_j \cdot w_{hj} \cdot x_i \\
&= -e_h \cdot x_i
\end{aligned}
$$

所以


$$
\Delta v_{ih} =-\eta \cfrac{\partial E_k}{\partial v_{ih}} =\eta e_h x_i
$$


Expand All @@ -298,16 +252,13 @@ $$
因为

$$
\Delta \gamma_h = -\eta \cfrac{\partial E_k}{\partial \gamma_h}
$$



$$
\begin{aligned}
\cfrac{\partial E_k}{\partial \gamma_h} &= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot \cfrac{\partial b_h}{\partial \gamma_h} \\
&= \sum_{j=1}^{l} \cfrac{\partial E_k}{\partial \hat{y}_j^k} \cdot \cfrac{\partial \hat{y}_j^k}{\partial \beta_j} \cdot \cfrac{\partial \beta_j}{\partial b_h} \cdot f^{\prime}(\alpha_h-\gamma_h) \cdot (-1) \\
Expand All @@ -316,16 +267,13 @@ $$
&= \sum_{j=1}^{l}g_j\cdot w_{hj} \cdot b_h(1-b_h)\\
&=e_h
\end{aligned}
$$

所以


$$
\Delta \gamma_h=-\eta\cfrac{\partial E_k}{\partial \gamma_h} = -\eta e_h
$$


Expand Down Expand Up @@ -353,39 +301,31 @@ Machine,简称RBM)本质上是一个引入了隐变量的无向图模型,


$$
E_{\rm graph}=E_{\rm edges}+E_{\rm nodes}
$$


其中,$E_{\rm graph}$表示图的能量,$E_{\rm edges}$表示图中边的能量,$E_{\rm nodes}$表示图中结点的能量。边能量由两连接结点的值及其权重的乘积确定,即$E_{{\rm edge}_{ij}}=-w_{ij}s_is_j$;结点能量由结点的值及其阈值的乘积确定,即$E_{{\rm node}_i}=-\theta_is_i$。图中边的能量为所有边能量之和为


$$
E_{\rm edges}=\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E_{{\rm edge}_{ij}}=-\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}w_{ij}s_is_j
$$


图中结点的能量为所有结点能量之和


$$
E_{\rm nodes}=\sum_{i=1}^nE_{{\rm node}_i}=-\sum_{i=1}^n\theta_is_i
$$


故状态向量$\boldsymbol{s}$所对应的Boltzmann机能量


$$
E_{\rm graph}=E_{\rm edges}+E_{\rm nodes}=-\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}w_{ij}s_is_j-\sum_{i=1}^n\theta_is_i
$$


Expand Down

0 comments on commit bce7cab

Please sign in to comment.