Add HMC

kmsmgsh · May 4, 2019 · 70c3f12 · 70c3f12
1 parent 9c57198
commit 70c3f12
Show file tree

Hide file tree

Showing 20 changed files with 2,533 additions and 692 deletions.
diff --git a/02-mathematical-statistic.Rmd b/02-mathematical-statistic.Rmd
@@ -0,0 +1,12 @@
+# Mathematical statistic Trick
+
+## Normal distribution as exponential family
+
+Theorem: If the density function with the form:
+$$
+f(x) \propto \exp \left\{-\frac{1}{2} x^{T} A x+B x\right\}
+$$
+
+Then, we have $X\sim N\left(\left(B A^{-1}\right)^{T}, A^{-1}\right)$
+
+可以配合\@ref(sumSquare) 食用更佳
diff --git a/07-Matrix-algebra.Rmd b/07-Matrix-algebra.Rmd
@@ -70,5 +70,25 @@ $$
 $$
 
 
+## 两个二次型相加 {#sumSquare}
+
+In vector formulation(Assuming $\Sigma_1$,$\Sigma_2$ are symmetric)
+有：
+$$
+\begin{array}{c}{-\frac{1}{2}\left(\mathbf{x}-\mathbf{m}_{1}\right)^{T} \mathbf{\Sigma}_{1}^{-1}\left(\mathbf{x}-\mathbf{m}_{1}\right)} \\ {-\frac{1}{2}\left(\mathbf{x}-\mathbf{m}_{2}\right)^{T} \mathbf{\Sigma}_{2}^{-1}\left(\mathbf{x}-\mathbf{m}_{2}\right)} \\ {=-\frac{1}{2}\left(\mathbf{x}-\mathbf{m}_{c}\right)^{T} \mathbf{\Sigma}_{c}^{-1}\left(\mathbf{x}-\mathbf{m}_{c}\right)+C}\end{array}
+$$
+其中：
+$$
+\begin{aligned} \mathbf{\Sigma}_{c}^{-1}=& \mathbf{\Sigma}_{1}^{-1}+\mathbf{\Sigma}_{2}^{-1} \\ \mathbf{m}_{c} &=\left(\mathbf{\Sigma}_{1}^{-1}+\mathbf{\Sigma}_{2}^{-1}\right)^{-1}\left(\mathbf{\Sigma}_{1}^{-1} \mathbf{m}_{1}+\mathbf{\Sigma}_{2}^{-1} \mathbf{m}_{2}\right) \\ C &=\frac{1}{2}\left(\mathbf{m}_{1}^{T} \mathbf{\Sigma}_{1}^{-1}+\mathbf{m}_{2}^{T} \mathbf{\Sigma}_{2}^{-1}\right)\left(\mathbf{\Sigma}_{1}^{-1}+\mathbf{\Sigma}_{2}^{-1}\right)^{-1}\left(\mathbf{\Sigma}_{1}^{-1} \mathbf{m}_{1}+\mathbf{\Sigma}_{2}^{-1} \mathbf{m}_{2}\right) \\ &-\frac{1}{2}\left(\mathbf{m}_{1}^{T} \mathbf{\Sigma}_{1}^{-1} \mathbf{m}_{1}+\mathbf{m}_{2}^{T} \mathbf{\Sigma}_{2}^{-1} \mathbf{m}_{2}\right) \end{aligned}
+$$
+
+
+而对于我需要的情况，简化$m_1=0$,$m_2=0$
+
+那么就有，只需要$\mathbf{\Sigma}_{c}^{-1}=\mathbf{\Sigma}_{1}^{-1}+\mathbf{\Sigma}_{2}^{-1}$,同时$m_c=0$.
+
+
+
+
 
 
diff --git a/13-HMC.Rmd b/13-HMC.Rmd
@@ -0,0 +1,167 @@
+# Hamiltonian Monte Carlo
+
+
+
+### Properties of Hamiltonian Dynamics
+1. Hamiltonian dynamics is *reversible*. That is, the mapping $T_{s}$ from the state at time t,$(q(t),p(t))$ to the state at time $t+s,(q(t+s),p(t+s))$, is one-to-one, so hence there exists an inverse, $T_{-s}$.
+
+可以从空间能量那一块来理解，由一个最普通的Hamilton的定义$H(q, p)=U(q)+K(p)$,中，动能部分$K(p)=K(-p)$,而由后文的$K(p)==p^{T} M^{-1} p / 2$,则p是速度，K(p)是动能(kinetic energy)，所以和p的方向无关。所以inverse mapping也可以通过对negating p之后再作用$T_s$,然后再negating p。
+
+在一元的例子中，正逆变换都是逆时针转s radians。
+
+而Hamiltonian dynamics的reversibility对MCMC更新非常重要，这样使用dynamics更新的MCMC就能使稳定分布是需要的分布。最简单证明稳定性的方法就是使用Reversible。
+
+### Conservation of the Hamiltonian
+
+A second property of the dynamics is that it keeps the Hamiltonian invariant (i.e. conserved).
+
+$$
+\frac{d H}{d t}=\sum_{i=1}^{d}\left[\frac{d q_{i}}{d t} \frac{\partial H}{\partial q_{i}}+\frac{d p_{i}}{d t} \frac{\partial H}{\partial p_{i}}\right]=\sum_{i=1}^{d}\left[\frac{\partial H}{\partial p_{i}} \frac{\partial H}{\partial q_{i}}-\frac{\partial H}{\partial q_{i}} \frac{\partial H}{\partial p_{i}}\right]=0
+$$
+等式成立来源于Hamilton's equations.
+
+在一元的例子中，相当于是说旋转变换保持了Hamiltonian的值不变，是half the squared distance from the origin.
+
+在Metropolis updates中，如果用 Hamiltonian dynamics作为proposal的话，也就是HMC，接受率是1如果H能保持invariant. 后面可以看到。但是在实践中只能保持H approximately invariant, 所以我们很难达到这个目标。
+
+### Volume preservation
+
+Hamiltonian dynamics的第三个基础性质是在(q,p)空间内他保持volume。（Liouville's theorem的结论）。如果我们队映射$T_s$作用到一个在（q,p）空间的区域R上的一些点，有体积V,则R在$T_s$的像也有体积V。
+在一维的例子中的体现就是映射只是旋转变化，明显不改变面基。当然也不改变区域的形状。后者则不总是成立。Hamiltonian dynamics 可能对区域在某个方向进行拉伸，当这个区域因为在其他方向非常集中，为了保证volume，所以进行拉伸。
+
+在MCMC中，保证volume不变的意以不导致对于任意change in volume in the acceptance probability for Metropolis updates. 改变volume不会导致Metropolis updates的接受概率变化。如果我们考虑proposed新的状态，用一些随机的，非Hamiltonian，dynamics,我们可能会需要去计算Jacobian matrix for the mapping the dynamics defines的行列式，这可能做不到。
+
+由之前的定义，divergence of the vector field have
+
+$$
+\sum_{i=1}^{d}\left[\frac{\partial}{\partial q_{i}} \frac{d q_{i}}{d t}+\frac{\partial}{\partial p_{i}} \frac{d p_{i}}{d t}\right]=\sum_{i=1}^{d}\left[\frac{\partial}{\partial q_{i}} \frac{\partial H}{\partial p_{i}}-\frac{\partial}{\partial p_{i}} \frac{\partial H}{\partial q_{i}}\right]=\sum_{i=1}^{d}\left[\frac{\partial^{2} H}{\partial q_{i} \partial p_{i}}-\frac{\partial^{2} H}{\partial p_{i} \partial q_{i}}\right]=0
+$$
+
+而有一个结论，A vector field with zero divergence can be shown to preserve volume.
+
+如果不通过divergence，可以有一个不严格的证明。从之前提到的行列式的角度进行思考。
+
+The volume preservation is equivalent to the determinant of the Jacobian matrix of $T_s$ having absolute value one, which is related to the well-known role of this determinant in regard to the effect of transformations on definite integrals and on probability density functions.
+
+Jacobian matrix of $T_s$, $2d\times 2d$, as a mapping of $z=(q,p)$, will be written as $B_s$. In general, $B_s$ will depend on the values of $q$ and $p$ before the mapping. When $B_s$ is diagonal, it is easy to see that the absolute values of its diagonal elements are the factors by which $T_s$ wtrtches or compresses a region in each dimesnions, so that the product of these factos, which is equal to the absolute value of $det(B_s)$, is the factor by which the volume of the region changes. Detail and general prove will not be put here, but, note that if rotate the coordinate system used,  $B_s$ would no longer be diagonal, but the determinant of $B_s$ is invariant to such transformations, and so would still give the gactor by which the volume changes.
+
+Consider Volume preservation for Hamiltonian dynamics in one dimension. Approximate $T_\delta$ for $\delta$ near zero as follows:
+
+$$
+T_{\delta}(q, p)=\left[ \begin{array}{c}{q} \\ {p}\end{array}\right]+\delta \left[ \begin{array}{l}{d q / d t} \\ {d p / d t}\end{array}\right]+\text { terms of order } \delta^{2} \text { or higher. }
+$$
+Taking the time derivatives from the Hamiltonian equation, the Jacobian matrix can be written as
+$$
+B_{\delta}=\left[ \begin{array}{cc}{1+\delta \frac{\partial^{2} H}{\partial q \partial p}} & {\delta \frac{\partial^{2} H}{\partial p^{2}}} \\ {-\delta \frac{\partial^{2} H}{\partial q^{2}}} & {1-\delta \frac{\partial^{2} H}{\partial p \partial q}}\end{array}\right]+\text { terms of order } \delta^{2} \text { or higher. }
+$$
+
+Then we can write the determinant of this matrix as
+
+$$
+\begin{aligned} \operatorname{det}\left(B_{\delta}\right) &=1+\delta \frac{\partial^{2} H}{\partial q \partial p}-\delta \frac{\partial^{2} H}{\partial p \partial q}+\text { terms of order } \delta^{2} \text { or higher } \\ &=1+\text { terms of order } \delta^{2} \text { or higher. } \end{aligned}
+$$
+
+Since $\log (1+x) \approx x$ for x near zero, $log det (B_\delta)$ is zero, except perhaps the terms of order $\delta^2$ or higher.  Now consider $log det (B_s)$ for some time interval s that is not close to zero let $\delta=s/n$ (现在就close to zero了)，then可以把$T_s$分解成作用n次$T_\delta$, so $det (B_s)$ is the n-fold product of $det(B_\delta)$ evaluated at n points along the trajectory. Then we have
+
+$$
+\begin{aligned} \log \operatorname{det}\left(B_{S}\right) &=\sum_{i=1}^{n} \log \operatorname{det}\left(B_{\delta}\right) \\ &=\sum_{i=1}^{n}\left\{\text { terms of order } 1 / n^{2} \text { or smaller }\right\} \\ &=\text { terms of order } 1 / n \text { or smaller. } \end{aligned}
+$$
+所以对于不靠近0的$B_s$, $log det (B_s)$,也有log几乎为0当$n\rightarrow 0$. 当然在sum过程中，$B_\delta$的值可能会依赖于i，也就是轨迹上的点（p,q）的变化会影响$T_s$. Assumeing that trajectories are not singular, the variation in $B_\delta$ must be bounded along any particular trajectory. （这个没懂，轨迹非退化，那么就有界然后就能通过n收敛？）
+所以就preserves volume。
+
+当高于一维的情况，Jacobian matrix有如下形式
+
+$$
+B_{8}=\left[ \begin{array}{c}{I+\delta\left[\frac{\partial^{2} H}{\partial q_{j} \partial p_{i}}\right]} & {\delta\left[\frac{\partial^{2} H}{\partial p_{j} \partial p_{i}}\right]} \\ {-\delta\left[\frac{\partial^{2} H}{\partial q_{j} \partial q_{i}}\right]} & {I-\delta\left[\frac{\partial^{2} H}{\partial p_{j} \partial q_{i}}\right]}\end{array}\right]+\text { terms of order } \delta^{2} \text { or higher. }
+$$
+类似的形式，但是变成了分块矩阵。
+
+高阶的项消掉了，剩下的项处理方式和1维的时候类似。
+
+### Symplecticness (辛？)
+
+volume不变性是Hamiltonian dynamics being symplectic的一个结论。假设$z=(q,p)$, 定义J是$J=\left[ \begin{array}{cc}{0_{d \times d}} & {I_{d \times d}} \\ {-I_{d \times d}} & {0_{d \times d}}\end{array}\right]$, symplecticness condition is that the Jacobian matrix, $B_s$, of the mapping $T_s$ satisfies
+
+$$
+B_{s}^{T} J^{-1} B_{s}=J^{-1}
+$$
+This implies volume conservation, since $det(B^T_s)det(J^{-1})det(B_s)=det(J^{-1})$ 则有$det(B_s)^2=1$. 当d>1的时候，symplecticness condition is stronger than volume preservation. Hamiltonian dynamics and the symplecticness condition can be generalized to where J is any matrix for which $J^T=-J$ and $det(j)\neq 0$.
+
+在实践中，Reversibility, preservation of volume, and symplecticness可以确实被保证，也必须被保证。
+
+
+## Discretizing Hamilton's Equations: The leapfrog method.
+
+为了在计算机中实现，Hamilton's equations must be approximated by discretizing time, using some small stepsize,$\epsilon$. 在0时刻开始，iteratively compute (approximately) the state at times $\epsilon,2\epsilon,3\epsilon,etc.$
+
+In discussing how to do this, assume that the Hamiltonian has the form $H(q,p)=U(q)+K(p)$. 虽然这个方法对各种定义的动能都适用，但是为了简化操作还是假设$K(p)=p^{T} M^{-1} p / 2$. 然后M是diagonal的，对角元是$m_1,...,m_d$, so that
+$$
+K(p)=\sum_{i=1}^{d} \frac{p_{i}^{2}}{2 m_{i}}
+$$
+相当于动能是每个方向上的动能之和。
+
+#### Euler's Method
+
+最广为人知的逼近微分方程系统的解的方法是Euler's method. 对于Hamilton's equations, this method performs the following step, for each component of position and momentum, indexed by i=1,...,d:
+
+$$
+\begin{array}{l}{p_{i}(t+\varepsilon)=p_{i}(t)+\varepsilon \frac{d p_{i}}{d t}(t)=p_{i}(t)-\varepsilon \frac{\partial U}{\partial q_{i}}(q(t))} \\ {q_{i}(t+\varepsilon)=q_{i}(t)+\varepsilon \frac{d q_{i}}{d t}(t)=q_{i}(t)+\varepsilon \frac{p_{i}(t)}{m_{i}}}\end{array}
+$$
+对于每个分量的速度和（势能？），下一时刻的等于这一时刻加上一阶导乘以步长，然后通过Hamilton’s equation就等于另外一个量的变化量。（减去的势能等于增加的动能？增加的动能等于减去的势能？）
+
+对于一般的Euler法解的轨迹并不好,轨迹叉到无穷上了，但是真实的轨道是一个圆、
+
+### Modification of Euler's Method
+
+$$
+\begin{array}{c}{p_{i}(t+\varepsilon)=p_{i}(t)-\varepsilon \frac{\partial U}{\partial q_{i}}(q(t))} \\ {q_{i}(t+\varepsilon)=q_{i}(t)+\varepsilon \frac{p_{i}(t+\varepsilon)}{m_{i}}}\end{array}
+$$
+
+变化是使用$p_i(t+\epsilon)$代替了$p_i(t)$。也就是说使用了“目前的动量”代替了前一时刻的动量。
+
+图中显示了虽然不完美，但是这个方法更靠近了真实的轨迹。 事实上，修改后的方法确实确保了volue，这样帮助避免发散到无穷或者螺旋地回到起点，这样就让volume expand to infinity or contracting to zero.
+
+要探究modification of Euler's method preserve volume, exactly despite the finite discretization of time, 注意这两个变换 from $(q(t), p(t))$ to $q(t),p(t+\epsilon)$ 通过modification的第一个等式，然后第二个等式是从$(q(t), p(t+\varepsilon))$变换到$(q(t+\varepsilon), p(t+\varepsilon))$. 这是"shear" transformation (剪羊毛？没懂)，每次变换只变换一个值，要么$p_i$，要么$q_i$,这样只会变一个参数，而any shear transformation will preserve volume,因为这样变换的Jacobian只有一个元，并且值为1.
+
+### The leapfrog Method
+Leapfrog method可以得到更好的结果
+
+$$
+\begin{aligned} p_{i}(t+\varepsilon / 2) &=p_{i}(t)-(\varepsilon / 2) \frac{\partial U}{\partial q_{i}}(q(t)) \\ q_{i}(t+\varepsilon) &=q_{i}(t)+\varepsilon \frac{p_{i}(t+\varepsilon / 2)}{m_{i}} \\ p_{i}(t+\varepsilon) &=p_{i}(t+\varepsilon / 2)-(\varepsilon / 2) \frac{\partial U}{\partial q_{i}}(q(t+\varepsilon)) \end{aligned}
+$$
+从冲量对应的变量开始半步更新，然后再重新开始做全更新。使用新的（更新了半步的）冲量变量，然后最后做剩下半步冲量变量的更新，使用新的位置变量（$q_i(t+\epsilon)$）。一个类似的方法可以对任何的动能函数成立，只需要用$\partial K / \partial p_{i}$替换$p_{i} / m_{i}$.
+这个方法当然也保证了volume，因为第一个更新通过第三个更新是shear transformation，由于对称性，这也是reversible的by simply negating p again.
+
+### Local and Global Error of discretization Methods.
+
+How the error from discretizing the dynamics behaves in the limit as the stepsize, $\epsilon$, goes to zero; Leimkuhler and Reich(2004) provide a much more detail discussion.
+
+The error goes to zero as $\epsilon$ goes to zero, so that any upper limit on the error will apply (apart from a usually unknown constant factor) to any differentiable function of state. For example, if the error for (q,p) is no more than order $\epsilon^2$, the error for $H(q,p)$ will also be no more than order $\epsilon^2$.
+
+The *local error* is the error after one step, that moves from time t to time $t+\epsilon$. The global error is the error after simulating for some fixed time interval, s, which will require $s/epsilon$ steps. 
+
+If the local error is order $\epsilon^p$, the global error will be order $\epsilon^{p-1}$. If we instead fix $\epsilon$ and consider increasing the time, s, for which the trajectory is simulated, the error can in general increase exponentially with s. Interesting. However, this is often not what happens when simulating Hamiltonian dynamics with a symplectic method, as can be seen in Figure. The Euler method and its modification above have order $\epsilon^2$ local error and order $\epsilon$ global error. Leapfrog method has order $\epsilon^3$ local error and order $\epsilon^2$ global error. As shown by Leimkuhler and Reich, this difference is a consequence of leapfrog being reversible, since any reversible method must have global error that is of even order in $\epsilon.$
+
+
+## MCMC from Hamiltonian Dynamics.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/19-imputation.Rmd b/19-imputation.Rmd
@@ -0,0 +1,6 @@
+# Imputation
+
+处理缺失值问题。
+Multiple imputation method. 
+
+