diff --git a/docs/index.md b/docs/index.md
index 30b0173..2b59e10 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -21,11 +21,11 @@ parts:
 ## Lecture Notes
 
 <ul>
-{% assign sorted = site.pages | sort: 'date' %}
+{% assign sorted = site.pages | sort: 'order' %}
 {% for page in sorted %}
   {% if page.tags == 'note' %}
   <li>
-    <a href="{{site.baseurl}}{{page.url}}">{{ page.title }} ({{ page.date | date: '%B %d, %Y' }})</a>
+    <a href="{{site.baseurl}}{{page.url}}">{{ page.title }}</a>
   </li>
   {% endif %}
 {% endfor %}
diff --git a/docs/notes/energy.inc b/docs/notes/energy.inc
index 0bc2b9e..af236a1 100644
--- a/docs/notes/energy.inc
+++ b/docs/notes/energy.inc
@@ -3,15 +3,17 @@ Probabilities & Energy
 
 Given a density $p(x)$, we call
 
-$$\begin{aligned}
-E(x) = -\log p(x) + c ~,\end{aligned}$$
+$$\begin{align}
+E(x) = -\log p(x) + c ~,
+\end{align}$$
 
 (for any choice of offset $c\in{\mathbb{R}}$) an energy function.
 Conversely, given an energy function $E(x)$, the corresponding density
 (called Boltzmann distribution) is
 
-$$\begin{aligned}
-p(x) = \textstyle\frac 1Z \exp\lbrace-E(x)\rbrace ~,\end{aligned}$$
+$$\begin{align}
+p(x) = \textstyle\frac 1Z \exp\lbrace-E(x)\rbrace ~,
+\end{align}$$
 
 where $Z$ is the normalization constant to ensure $\int_x p(x) = 1$, and
 $c=-\log Z$. From the perspective of physics, one can motivate why
@@ -22,14 +24,16 @@ minimalistically as follows:
 Probabilities are *multiplicative* (axiomatically): E.g., the likelihood
 of i.i.d.&nbsp;data $D = \lbrace x_i \rbrace_{i=1}^n$ is the product
 
-$$\begin{aligned}
-P(D) = \prod_{i=1}^n p(x_i) ~.\end{aligned}$$
+$$\begin{align}
+P(D) = \prod_{i=1}^n p(x_i) ~.
+\end{align}$$
 
 We often want to rewrite this with a log to have an *additive*
 expression
 
-$$\begin{aligned}
-E(D) = -\log P(D) = \sum_{i=1}^n E(x_i) ~,\quad E(x_i) = -\log p(x_i) ~.\end{aligned}$$
+$$\begin{align}
+E(D) = -\log P(D) = \sum_{i=1}^n E(x_i) ~,\quad E(x_i) = -\log p(x_i) ~.
+\end{align}$$
 
 The minus is a convention so that we can call the quantity $E(D)$ a
 *loss* or *error* &ndash; something we want to minimize instead of
@@ -45,13 +49,14 @@ P(D_2)$, (2) $E(D)$ is additive $E(D_1\cup D_2) = E(D_1) + E(D_2)$, and
 (3) there is a mapping $P(D) = \textstyle\frac 1Z f(E(D))$ between both.
 Then it follows that
 
-$$\begin{aligned}
+$$\begin{align}
 P(D_1\cup D_2)
 &= P(D_1)~ P(D_2) = \textstyle\frac 1{Z_1} f(E(D_1))~ \textstyle\frac 1{Z_2} f(E(D_2)) \\
 P(D_1\cup D_2)
 &= \textstyle\frac 1{Z_0} f(E(D_1\cup D_2)) = \textstyle\frac 1{Z_0} f(E(D_1) + E(D_2)) \\
 \Rightarrow\quad \textstyle\frac 1{Z_1} f(E_1) \textstyle\frac 1{Z_2} f(E_2)
-&= \textstyle\frac 1{Z_0} f(E_1+E_2) ~, \label{exp}\end{aligned}$$
+&= \textstyle\frac 1{Z_0} f(E_1+E_2) ~, \label{exp}
+\end{align}$$
 
 where we defined $E_i=E(D_i)$. The only function to fulfill the last
 equation for any $E_1,E_2\in{\mathbb{R}}$ is the exponential function
@@ -59,8 +64,9 @@ $f(E) = \exp\lbrace -\beta E \rbrace$ with arbitrary coefficient $\beta$
 (and minus sign being a convention, $Z_0 = Z_1
  Z_2$). Boiling this down to an individual element $x\in D$, we have
 
-$$\begin{aligned}
-p(x) = \textstyle\frac 1Z \exp\lbrace-\beta E(x)\rbrace ~,\quad\beta E(x) = -\log p(x) - \log Z ~.\end{aligned}$$
+$$\begin{align}
+p(x) = \textstyle\frac 1Z \exp\lbrace-\beta E(x)\rbrace ~,\quad\beta E(x) = -\log p(x) - \log Z ~.
+\end{align}$$
 
 Partition Function
 ------------------
diff --git a/docs/notes/energy.md b/docs/notes/energy.md
index 015bc22..2f75ffb 100644
--- a/docs/notes/energy.md
+++ b/docs/notes/energy.md
@@ -2,11 +2,12 @@
 layout: home
 title:  "Probabilities, Energy, Boltzmann, and Partition Function"
 date: 2024-08-19
+order: 2
 tags: note
 ---
 
 *[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
-Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
 
 [[pdf version](../pdfs/energy.pdf)]
 
diff --git a/docs/notes/entropy.inc b/docs/notes/entropy.inc
index cb9e544..ca2fae5 100644
--- a/docs/notes/entropy.inc
+++ b/docs/notes/entropy.inc
@@ -26,8 +26,9 @@ sample $B$ surprises you more and gives you 2 bits of information.
 
 The entropy
 
-$$\begin{aligned}
-H(p) &= - \sum_x p(x) \log p(x) = \mathbb{E}_{p(x)}\!\left\{-\log p(x)\right\}\end{aligned}$$
+$$\begin{align}
+H(p) &= - \sum_x p(x) \log p(x) = \mathbb{E}_{p(x)}\!\left\{-\log p(x)\right\}
+\end{align}$$
 
 is the expected neg-log-likelihood. It is a measure of the distribution
 $p$ itself, not of a specific sample. It measures, how much *in average*
@@ -48,10 +49,10 @@ by a uniform discrete random variable of cardinality
 $P\mkern-1pt{}P(p)$".
 
 Given a gaussian distribution
-$p(x) \propto \exp\{-{\frac{1}{2}}(x-\mu)^2/\sigma^2\}$, the
+$p(x) \propto \exp\{-{\textstyle\frac{1}{2}}(x-\mu)^2/\sigma^2\}$, the
 neg-log-likelihood of a specific sample $x$ is
-$-\log p(x) = -{\frac{1}{2}}(x-\mu)^2/\sigma^2 + \textit{const}$. This
-can be thought of as the *square error* of $x$ from $\mu$, and its
+$-\log p(x) = -{\textstyle\frac{1}{2}}(x-\mu)^2/\sigma^2 + \textit{const}$.
+This can be thought of as the *square error* of $x$ from $\mu$, and its
 expectation (entropy) is the mean square error. Generally, the
 neg-log-likelihood $-\log p(x)$ often relates to an error or loss
 function.
@@ -61,8 +62,9 @@ Cross-Entropy
 
 The cross-entropy
 
-$$\begin{aligned}
-H(p,q) = - \sum_x p(x) \log q(x) = \mathbb{E}_{p(x)}\!\left\{-\log q(x)\right\}\end{aligned}$$
+$$\begin{align}
+H(p,q) = - \sum_x p(x) \log q(x) = \mathbb{E}_{p(x)}\!\left\{-\log q(x)\right\}
+\end{align}$$
 
 is also an expected neg-log-likelihood, but expectation is
 w.r.t.&nbsp;$p$, while the nll is w.r.t.&nbsp;$q$. This corresponds to
@@ -81,8 +83,9 @@ $p_{\bar y}(\cdot) = [0,..,0,1,0,..,0]$ with $1$ for $y=\bar y$. The
 cross-entropy is then nothing but the neg-log-likelihood of the true
 class label under the learned model:
 
-$$\begin{aligned}
-H(p_{\bar y}, q_\theta(\cdot|x)) = - \log q_\theta(\bar y|x) ~.\end{aligned}$$
+$$\begin{align}
+H(p_{\bar y}, q_\theta(\cdot|x)) = - \log q_\theta(\bar y|x) ~.
+\end{align}$$
 
 Note that we could equally cast a square error loss as a cross-entropy:
 If $y$ is continuous, $q_\theta(y|x)$ Gaussian around a mean prediction
@@ -96,16 +99,18 @@ Relative Entropy (KL-divergence)
 The Kullback-Leibler divergence, also called relative entropy, is
 defined as
 
-$$\begin{aligned}
+$$\begin{align}
 D\big(p\,\big\Vert\,q\big)
 &= \sum_x p(x) \log\frac{p(x)}{q(x)}
- = \mathbb{E}_{p(x)}\!\left\{\log\frac{p(x)}{q(x)}\right\} ~.\end{aligned}$$
+ = \mathbb{E}_{p(x)}\!\left\{\log\frac{p(x)}{q(x)}\right\} ~.
+\end{align}$$
 
 Given our definitions above, we can rewrite it as
 
-$$\begin{aligned}
+$$\begin{align}
 D\big(p\,\big\Vert\,q\big)
-&= H(p, q) - H(p) ~.\end{aligned}$$
+&= H(p, q) - H(p) ~.
+\end{align}$$
 
 Note that we described $H(p)$ as the expected code length when encoding
 $p$-samples using a $p$-model; and $H(p, q)$ as the expected code length
diff --git a/docs/notes/entropy.md b/docs/notes/entropy.md
index da12ed7..ecf117f 100644
--- a/docs/notes/entropy.md
+++ b/docs/notes/entropy.md
@@ -2,11 +2,12 @@
 layout: home
 title:  "Entropy, Information, Cross-Entropy, and ML as Minimal Description Length"
 date: 2024-08-15
+order: 1
 tags: note
 ---
 
 *[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
-Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
 
 [[pdf version](../pdfs/entropy.pdf)]
 
diff --git a/docs/notes/gaussians.inc b/docs/notes/gaussians.inc
new file mode 100644
index 0000000..5436445
--- /dev/null
+++ b/docs/notes/gaussians.inc
@@ -0,0 +1,257 @@
+Definitions
+-----------
+
+A Gaussian over $x\in{\mathbb{R}}^n$ with mean $a\in{\mathbb{R}}^n$ and
+sym.pos.dev.&nbsp;covariance matrix $A\in{\mathbb{R}}^{n\times n}$ is
+defined as:
+
+$$\begin{align}
+{\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A) &= \frac{1}{|2\pi A|^{1/2}}~ \exp\{-{\textstyle\frac{1}{2}}(x-a)^{\mkern-1pt \top \mkern-1pt}
+A^\text{-1} (x-a)\} ~.
+\end{align}$$
+
+We also define a notation for its so-called *canonical form*, with
+sym.pos.def.&nbsp;precision matrix $A\in{\mathbb{R}}^{n\times n}$, as
+
+$$\begin{align}
+{\cal N}[x  \mkern-1pt \mid \mkern-1pt  a,A]
+ = \frac{\exp\{-{\textstyle\frac{1}{2}}a^{\mkern-1pt \top \mkern-1pt}A^\text{-1}a\}}{|2\pi A^\text{-1}|^{1/2}}~
+   \exp\{-{\textstyle\frac{1}{2}}x^{\mkern-1pt \top \mkern-1pt}A x + x^{\mkern-1pt \top \mkern-1pt}a\} ~.
+\end{align}$$
+
+It holds
+
+$$\begin{align}
+& {\cal N}[x  \mkern-1pt \mid \mkern-1pt  a,A] = {\cal N}(x  \mkern-1pt \mid \mkern-1pt  A^\text{-1} a, A^\text{-1}) ~,\quad
+ {\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A) = {\cal N}[x  \mkern-1pt \mid \mkern-1pt  A^\text{-1} a, A^\text{-1}] ~.
+\end{align}$$
+
+Matrix Identities
+-----------------
+
+As a background, here are matrix identities (based on the
+[matrix-cookbook](https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf))
+which are useful work with Gaussians:
+
+$$\begin{align}
+(A^\text{-1} + B^\text{-1})^\text{-1} &= A~ (A\!+\!B)^\text{-1}~ B = B~ (A\!+\!B)^\text{-1}~ A \\
+(A^\text{-1} - B^\text{-1})^\text{-1} &= A~ (B\!-\!A)^\text{-1}~ B \\
+\partial_x |A_x| &= |A_x|~ {\rm tr}(A_x^\text{-1}~ \partial_x A_x) \\
+\partial_x A_x^\text{-1} &= - A_x^\text{-1}~ (\partial_x A_x)~ A_x^\text{-1} \\
+(A+UBV)^\text{-1} &= A^\text{-1} - A^\text{-1} U (B^\text{-1} + VA^\text{-1}U)^\text{-1} V A^\text{-1} \label{wood}\\
+(A^\text{-1}+B^\text{-1})^\text{-1} &= A - A (B + A)^\text{-1} A \\
+(A + J^{\mkern-1pt \top \mkern-1pt}B J)^\text{-1} J^{\mkern-1pt \top \mkern-1pt}B 
+&= A^\text{-1} J^{\mkern-1pt \top \mkern-1pt}(B^\text{-1} + J A^\text{-1} J^{\mkern-1pt \top \mkern-1pt})^\text{-1} \label{wood2}\\
+(A + J^{\mkern-1pt \top \mkern-1pt}B J)^\text{-1} A
+&= {\rm\bf I}- (A + J^{\mkern-1pt \top \mkern-1pt}B J)^\text{-1} J^{\mkern-1pt \top \mkern-1pt}B J  \label{null}
+\end{align}$$
+
+$\eqref{wood}$=Woodbury; $\eqref{wood2}$,$\eqref{null}$ hold for
+pos.def.&nbsp;$A$ and $B$
+
+Derivatives
+-----------
+
+$$\begin{align}
+\partial_x {\cal N}(x|a,A) &= {\cal N}(x|a,A)~ (-h^{\mkern-1pt \top \mkern-1pt})  ~,\quad h:= A^\text{-1}(x-a)\\
+\partial_\theta{\cal N}(x|a,A)
+&= {\cal N}(x|a,A)~ \Big[- h^{\mkern-1pt \top \mkern-1pt}(\partial_\theta x) 
+   + h^{\mkern-1pt \top \mkern-1pt}(\partial_\theta a)
+   - {\textstyle\frac{1}{2}}{\rm tr}(A^\text{-1}~ \partial_\theta A)
+   + {\textstyle\frac{1}{2}}h^{\mkern-1pt \top \mkern-1pt}(\partial_\theta A) h \Big] \\
+\partial_\theta{\cal N}[x|a,A]
+& = {\cal N}[x|a,A]~ \Big[ -{\textstyle\frac{1}{2}}x^{\mkern-1pt \top \mkern-1pt}\partial_\theta A x + {\textstyle\frac{1}{2}}a^{\mkern-1pt \top \mkern-1pt}A^\text{-1} \partial_\theta A A^\text{-1} a 
++ x^{\mkern-1pt \top \mkern-1pt}\partial_\theta a - a^{\mkern-1pt \top \mkern-1pt}A^\text{-1} \partial_\theta a + {\textstyle\frac{1}{2}}{\rm tr}(\partial_\theta A A^\text{-1}) \Big]
+\end{align}$$
+
+Product
+-------
+
+The product of two Gaussians can be expressed as
+
+$$\begin{align}
+{\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A)~ {\cal N}(x  \mkern-1pt \mid \mkern-1pt  b,B)
+ &= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  A^\text{-1} a+B^\text{-1} b, A^\text{-1} + B^\text{-1}]~ {\cal N}(a \mkern-1pt \mid \mkern-1pt b,A+B) ~, \label{prodNat}\\
+ &= {\cal N}(x  \mkern-1pt \mid \mkern-1pt  B(A\!+\!B)^\text{-1}a + A(A\!+\!B)^\text{-1}b ,A(A\!+\!B)^\text{-1}B)~ {\cal N}(a \mkern-1pt \mid \mkern-1pt b,A+B) ~,\\
+{\cal N}[x  \mkern-1pt \mid \mkern-1pt  a,A]~ {\cal N}[x  \mkern-1pt \mid \mkern-1pt  b,B]
+ &= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  a+b,A+B]~ {\cal N}(A^\text{-1} a  \mkern-1pt \mid \mkern-1pt  B^\text{-1} b, A^\text{-1}+B^\text{-1}) \\
+ &= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  a+b,A+B]~ {\cal N}[ A^\text{-1} a  \mkern-1pt \mid \mkern-1pt  A(A\!+\!B)^\text{-1} b, A(A\!+\!B)^\text{-1} B]\\
+ &= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  a+b,A+B]~ {\cal N}[ A^\text{-1} a  \mkern-1pt \mid \mkern-1pt  (1\!-\!B(A\!+\!B)^\text{-1})~ b,~ (1\!-\!B(A\!+\!B)^\text{-1})~ B] ~,\\
+{\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A)~ {\cal N}[x  \mkern-1pt \mid \mkern-1pt  b,B]
+ &= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  A^\text{-1} a+ b, A^\text{-1} + B]~ {\cal N}(a \mkern-1pt \mid \mkern-1pt B^\text{-1} b,A+B^\text{-1}) \\
+ &= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  A^\text{-1} a+ b, A^\text{-1} + B]~ {\cal N}[a \mkern-1pt \mid \mkern-1pt (1\!-\!B(A^\text{-1}\!+\!B)^\text{-1})~ b,~ (1\!-\!B(A^\text{-1}\!+\!B)^\text{-1})~
+ B] \label{prodNatCan}
+\end{align}$$
+
+Convolution
+-----------
+
+$$\begin{align}
+\textstyle\int_x {\cal N}(x  \mkern-1pt \mid \mkern-1pt a,A)~ {\cal N}(y-x  \mkern-1pt \mid \mkern-1pt  b,B)~ dx
+ &= {\cal N}(y  \mkern-1pt \mid \mkern-1pt  a+b, A+B)
+\end{align}$$
+
+Division
+--------
+
+$$\begin{align}
+{\cal N}(x|&a,A) ~\big/~ {\cal N}(x|b,B) = {\cal N}(x|c,C) ~\big/~ {\cal N}(c| b, C+B) ~,\quad C^\text{-1}c = A^\text{-1}a - B^\text{-1}b,~ C^\text{-1} = A^\text{-1} - B^\text{-1} \\
+{\cal N}[x|&a,A] ~\big/~ {\cal N}[x|b,B] \propto {\cal N}[x|a-b,A-B]
+\end{align}$$
+
+Expectations
+------------
+
+Let $x\sim{\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A)$, we have:
+
+$$\begin{align}
+&\mathbb{E}_{x}\!\left\{g(x)\right\} := \textstyle\int_x {\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A)~ g(x)~ dx \\
+%&\Exp[x]{g(f+Fx)} = 
+&\mathbb{E}_{x}\!\left\{x\right\} = a ~,\quad\mathbb{E}_{x}\!\left\{x x^{\mkern-1pt \top \mkern-1pt}\right\} = A + a a^{\mkern-1pt \top \mkern-1pt}\\
+&\mathbb{E}_{x}\!\left\{f+Fx\right\} = f+Fa \\
+&\mathbb{E}_{x}\!\left\{x^{\mkern-1pt \top \mkern-1pt}x\right\} = a^{\mkern-1pt \top \mkern-1pt}a + {\rm tr}(A)\\
+&\mathbb{E}_{x}\!\left\{(x-m)^{\mkern-1pt \top \mkern-1pt}R(x-m)\right\} = (a-m)^{\mkern-1pt \top \mkern-1pt}R(a-m) + {\rm tr}(RA)
+\end{align}$$
+
+Linear Transformation
+---------------------
+
+For any $f\in{\mathbb{R}}^n$ and full-rank
+$F\in{\mathbb{R}}^{n\times n}$, the following identities hold:
+
+$$\begin{align}
+{\cal N}(x \mkern-1pt \mid \mkern-1pt a,A) &= {\cal N}(x+f \mkern-1pt \mid \mkern-1pt a+f,~A) \\
+{\cal N}(x \mkern-1pt \mid \mkern-1pt a,A) &= |F|~ {\cal N}(Fx  \mkern-1pt \mid \mkern-1pt  Fa,~FAF^{\mkern-1pt \top \mkern-1pt}) \\
+{\cal N}(F x + f  \mkern-1pt \mid \mkern-1pt  a,A)
+&= \frac{1}{|F|}~ {\cal N}(x  \mkern-1pt \mid \mkern-1pt  ~ F^\text{-1} (a-f),~ F^\text{-1} AF^{\text{-}\!\top})
+ = \frac{1}{|F|}~ {\cal N}[x  \mkern-1pt \mid \mkern-1pt  ~ F^{\mkern-1pt \top \mkern-1pt}A^\text{-1} (a-f),~ F^{\mkern-1pt \top \mkern-1pt}A^\text{-1} F] ~, \\
+{\cal N}[F x + f  \mkern-1pt \mid \mkern-1pt  a,A]
+&= \frac{1}{|F|}~ {\cal N}[x  \mkern-1pt \mid \mkern-1pt  ~ F^{\mkern-1pt \top \mkern-1pt}(a-Af),~ F^{\mkern-1pt \top \mkern-1pt}A F] ~.
+\end{align}$$
+
+"Propagation"
+-------------
+
+Propagating a message along a linear coupling (e.g.&nbsp;forward model),
+using eqs $\eqref{prodNat}$ and $\eqref{prodNatCan}$, respectively, we
+have:
+
+$$\begin{align}
+& \textstyle\int_y {\cal N}(x  \mkern-1pt \mid \mkern-1pt  a + Fy, A)~ {\cal N}(y  \mkern-1pt \mid \mkern-1pt  b, B)~ dy
+ = {\cal N}(x  \mkern-1pt \mid \mkern-1pt  a + Fb, A+FBF^{\mkern-1pt \top \mkern-1pt}) \\
+& \textstyle\int_y {\cal N}(x  \mkern-1pt \mid \mkern-1pt  a + Fy, A)~ {\cal N}[y  \mkern-1pt \mid \mkern-1pt  b, B]~ dy
+ = {\cal N}[x  \mkern-1pt \mid \mkern-1pt  (F^{\text{-}\!\top}\!-\!K)(b+BF^\text{-1}a),~ (F^{\text{-}\!\top}\!-\!K)BF^\text{-1}] ~,
+\end{align}$$
+
+where
+$K=F^{\text{-}\!\top}B(F^{\text{-}\!\top}A^\text{-1} F^\text{-1}\!+\!B)^\text{-1}$.
+
+Marginal & Conditional
+----------------------
+
+$$\begin{align}
+{\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A)~ {\cal N}(y  \mkern-1pt \mid \mkern-1pt  b+Fx,B)
+ &= {\cal N}\bigg(  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}a\\ b+Fa\end{array}  ,~
+              \begin{array}{cc}A & A^{\mkern-1pt \top \mkern-1pt}F^{\mkern-1pt \top \mkern-1pt}\\ F A & B\!+\!F A^{\mkern-1pt \top \mkern-1pt}F^{\mkern-1pt \top \mkern-1pt}\end{array}  \bigg) \\
+%
+{\cal N}\bigg(  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}a\\ b\end{array}  ,~ 
+          \begin{array}{cc}A & C\\ C^{\mkern-1pt \top \mkern-1pt}& B\end{array}  \bigg)
+&= {\cal N}(x  \mkern-1pt \mid \mkern-1pt  a,A) \cdot {\cal N}(y  \mkern-1pt \mid \mkern-1pt  b+C^{\mkern-1pt \top \mkern-1pt}A^\text{-1}(x-a),~ B - C^{\mkern-1pt \top \mkern-1pt}A^\text{-1} C) \\
+%
+{\cal N}[ x  \mkern-1pt \mid \mkern-1pt  a,A ]~ {\cal N}(y  \mkern-1pt \mid \mkern-1pt  b+Fx,B )
+ &= {\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}a+F^{\mkern-1pt \top \mkern-1pt}B^\text{-1} b \\ B^\text{-1} b\end{array}  ,~
+              \begin{array}{cc}A+F^{\mkern-1pt \top \mkern-1pt}B^\text{-1} F & -F^{\mkern-1pt \top \mkern-1pt}B^\text{-1} \\ -B^\text{-1} F & B^\text{-1}\end{array}  \bigg] \\
+%
+{\cal N}[x  \mkern-1pt \mid \mkern-1pt  a,A ]~ {\cal N}[y  \mkern-1pt \mid \mkern-1pt  b+Fx,B ]
+ &= {\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}a+F^{\mkern-1pt \top \mkern-1pt}B^\text{-1} b \\ b\end{array}  ,~
+              \begin{array}{cc}A+F^{\mkern-1pt \top \mkern-1pt}B^\text{-1}  F & -F^{\mkern-1pt \top \mkern-1pt}\\ -F & B\end{array}  \bigg] \\
+%
+{\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}a\\ b\end{array}  ,~ 
+          \begin{array}{cc}A & C\\ C^{\mkern-1pt \top \mkern-1pt}& B\end{array}  \bigg]
+&= {\cal N}[x  \mkern-1pt \mid \mkern-1pt  a - C B^\text{-1} b,~ A - C B^\text{-1} C^{\mkern-1pt \top \mkern-1pt}] \cdot {\cal N}[y  \mkern-1pt \mid \mkern-1pt  b-C^{\mkern-1pt \top \mkern-1pt}x,B] \\
+\left| \begin{array}{cc}A&C\\D&B\end{array} \right|
+ &= |A|~ |\widehat B| = |\widehat A|~ |B| ~,
+ \text{where }  \begin{array}{l} \widehat A = A - C B^\text{-1} D \\ \widehat B = B - D A^\text{-1} C \end{array} \\
+\left[ \begin{array}{cc}A&C\\D&B\end{array} \right]^\text{-1}
+ &= \left[ \begin{array}{cc}\widehat A^\text{-1}&-A^\text{-1} C \widehat B^\text{-1}\\-\widehat B^\text{-1} D A^\text{-1}&\widehat B^\text{-1}\end{array} \right]
+ = \left[ \begin{array}{cc}\widehat A^\text{-1}&-\widehat A^\text{-1} C B^\text{-1}\\-B^\text{-1} D \widehat A^\text{-1}&\widehat B^\text{-1}\end{array} \right]
+\end{align}$$
+
+Pair-wise Belief
+----------------
+
+We have a message
+$\alpha(x)={\cal N}[x  \mkern-1pt \mid \mkern-1pt  s,S]$, transition
+$P(y|x) = {\cal N}(y  \mkern-1pt \mid \mkern-1pt  A x+a,Q)$, and a
+message $\beta(y)={\cal N}[y  \mkern-1pt \mid \mkern-1pt  v,V]$, what is
+the belief $b(y,x)=\alpha(x)P(y|x)\beta(y)$?
+
+$$\begin{align}
+b(y,x)
+ &= {\cal N}[x|s,S]~ {\cal N}(y|A x+a,Q^\text{-1})~ {\cal N}[y|v,V] \\
+&=
+{\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}s \\ 0\end{array}  ,~
+              \begin{array}{cc}S & 0 \\ 0 & 0\end{array}  \bigg]~~~
+{\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}A^{\mkern-1pt \top \mkern-1pt}Q^\text{-1} a \\ Q^\text{-1} a\end{array}  ,~
+              \begin{array}{cc}A^{\mkern-1pt \top \mkern-1pt}Q^\text{-1} A & -A^{\mkern-1pt \top \mkern-1pt}Q^\text{-1} \\ -Q^\text{-1} A & Q^\text{-1}\end{array}  \bigg]~~~
+{\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}0 \\ v\end{array}  ,~
+              \begin{array}{cc}0 & 0 \\ 0 & V\end{array}  \bigg] \\
+&\propto
+{\cal N}\bigg[  \begin{array}{c}x\\ y\end{array}  \bigg|  \begin{array}{c}s + A^{\mkern-1pt \top \mkern-1pt}Q^\text{-1} a\\ v + Q^\text{-1} a\end{array}  ,~
+              \begin{array}{cc}S + A^{\mkern-1pt \top \mkern-1pt}Q^\text{-1} A & -A^{\mkern-1pt \top \mkern-1pt}Q^\text{-1} \\ -Q^\text{-1} A & V+Q^\text{-1}\end{array}  \bigg]
+\end{align}$$
+
+Entropy
+-------
+
+$$\begin{align}
+H({\cal N}(a,A)) &= {\textstyle\frac{1}{2}}\log |2\pi e A|
+\end{align}$$
+
+Kullback-Leibler divergence
+---------------------------
+
+For $p={\cal N}(x|a,A),~ q={\cal N}(x|b,B), n = \text{dim}(x)$ and
+definition
+$D\big(p\,\big\Vert\,q\big) = \sum_x p(x) \log\frac{p(x)}{q(x)}$, we
+have:
+
+$$\begin{align}
+2~ D\big(p\,\big\Vert\,q\big)
+&= \log\frac{|B|}{|A|} + {\rm tr}(B^\text{-1}A) + (b-a)^{\mkern-1pt \top \mkern-1pt}B^\text{-1} (b-a) - n \\
+4~ D_\text{sym}\big(p \,\big\Vert\, q\big)
+&= {\rm tr}(B^\text{-1}A) + {\rm tr}(A^\text{-1}B) + (b-a)^{\mkern-1pt \top \mkern-1pt}(A^\text{-1}+B^\text{-1}) (b-a) - 2n
+\end{align}$$
+
+$\lambda$-divergence:
+
+$$\begin{align}
+2~ D_\lambda\big(p \,\big\Vert\, q\big)
+&= \lambda~ D\big(p\,\big\Vert\,\lambda p+(1\!-\!\lambda)q\big) ~+~ (1\!-\!\lambda)~ D\big(p\,\big\Vert\,(1\!-\!\lambda) p + \lambda q\big)
+\end{align}$$
+
+For $\lambda=.5$: Jensen-Shannon divergence.
+
+Log-likelihoods
+---------------
+
+$$\begin{align}
+\log {\cal N}(x|a,A)
+ &= - {\textstyle\frac{1}{2}}\Big[ \log|2\pi A| + (x-a)^{\mkern-1pt \top \mkern-1pt}A^\text{-1} (x-a) \Big] \\
+\log {\cal N}[x|a,A]
+ &= - {\textstyle\frac{1}{2}}\Big[ \log|2\pi A^\text{-1}| + a^{\mkern-1pt \top \mkern-1pt}A^\text{-1} a + x^{\mkern-1pt \top \mkern-1pt}A x - 2 x^{\mkern-1pt \top \mkern-1pt}a \Big] \\
+\sum_x {\cal N}(x|b,B) \log {\cal N}(x|a,A)
+ &= -D\big({\cal N}(b,B)\,\big\Vert\,{\cal N}(a,A)\big) - H({\cal N}(b,B))
+\end{align}$$
+
+Mixture of Gaussians
+--------------------
+
+&nbsp; Collapsing a MoG into a single Gaussian
+
+$$\begin{align}
+&\text{argmin}_{b,B} D\big(\sum_i p_i~ {\cal N}(a_i,A_i)\,\big\Vert\,{\cal N}(b,B)\big)
+\quad=\quad\Big(
+b=\sum_i p_i a_i ~,~
+B=\sum_i p_i (A_i + a_i a_i^{\mkern-1pt \top \mkern-1pt}- b\, b^{\mkern-1pt \top \mkern-1pt})\Big)
+\end{align}$$
diff --git a/docs/notes/gaussians.md b/docs/notes/gaussians.md
new file mode 100644
index 0000000..ff22348
--- /dev/null
+++ b/docs/notes/gaussians.md
@@ -0,0 +1,16 @@
+---
+layout: home
+title:  "Gaussian Identities"
+date: 2011-01-25
+order: 8
+tags: note
+---
+
+*[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
+
+[[pdf version](../pdfs/gaussians.pdf)]
+
+{% include_relative gaussians.inc %}
+
+{% include note-footer.md %}
diff --git a/docs/notes/latex-macros.inc b/docs/notes/latex-macros.inc
index 33b5b1c..575b801 100644
--- a/docs/notes/latex-macros.inc
+++ b/docs/notes/latex-macros.inc
@@ -147,9 +147,9 @@
   \newcommand{\tsum}{\textstyle\sum}
   \newcommand{\st}{~~\text{s.t.}~~}
 
-  \newcommand{\half}{{\frac{1}{2}}}
-  \newcommand{\third}{{\frac{1}{3}}}
-  \newcommand{\fourth}{{\frac{1}{4}}}
+  \newcommand{\half}{{\textstyle\frac{1}{2}}}
+  \newcommand{\third}{{\textstyle\frac{1}{3}}}
+  \newcommand{\fourth}{{\textstyle\frac{1}{4}}}
 
   \newcommand{\ubar}{\underline}
   %\renewcommand{\vec}{\underline}
@@ -257,11 +257,11 @@
 \newcommand{\THi}{T^\sharp_H}
 \newcommand{\Jci}{J^\natural_C}
 \newcommand{\hJi}{{\bar J}^\sharp}
-\renewcommand{\|}{\,|\,}
+\renewcommand{\|}{ \mkern-1pt \mid \mkern-1pt }
 \renewcommand{\=}{\!=\!}
-\newcommand{\myminus}{ \textrm{-} \mkern-1pt }
+\newcommand{\myminus}{ - }
 \newcommand{\myplus}{ \textrm{+} \mkern-1pt }
-\newcommand{\1}{{\myminus1}}
+\newcommand{\1}{\text{-1}}
 \newcommand{\2}{{\myminus2}}
 \newcommand{\3}{{\myminus3}}
 \newcommand{\mT}{{\text{\rm -}\hspace*{-1pt}\top}}
@@ -502,7 +502,7 @@
 \newcommand{\doclink}[2]{#1 [<#2>]}
 \newcommand{\codetip}[1]{\begin{shaded} Code tutorials: #1 \end{shaded}}
 \renewcommand{\url}[1]{<#1>}
-\renewcommand{\href}[2]{#2 <#1>}
+\renewcommand{\href}[2]{HREF-START#2HREF-MID\verb^#1^HREF-END}
 \newcommand{\argmin}{\text{argmin}}
 \newcommand{\tutorial}[2]{#1 <https://marctoussaint.github.io/robotic/tutorials/#2.html>}
 
diff --git a/docs/notes/make.sh b/docs/notes/make.sh
index 97a48a1..3c8c532 100755
--- a/docs/notes/make.sh
+++ b/docs/notes/make.sh
@@ -3,11 +3,17 @@ do
     filename="${input##*/}"
     echo "=============== ${input} ${filename}"
     cat latex-macros.inc ${input} > z.tex
+    sed -i \
+	-e 's/^\\begin{align}/\n$$\\begin{align}/' \
+	-e 's/^\\end{align}$/\\end{align}$$\n/' \
+	-e 's/\\eqref{\([^}]*\)}/$\\eqref{\1}$/g' \
+	z.tex
     pandoc z.tex --ascii -o z.md
     sed -i \
 	-e 's/SPAN-END/<\/span>/g' \
 	-e 's/SPAN-CENTER/<span style="display:block; margin-left:auto; margin-right:auto; width:70%">/g' \
 	-e 's/SPAN-SMALL/<span style="display: block; font-size:0.8em;">/g' \
+	-e 's/HREF-START\(.*\)HREF-MID`\(.*\)`HREF-END/[\1](\2)/' \
 	z.md
     cat z.md > ${filename%.*}.inc
     rm z.md
diff --git a/docs/notes/quaternions.inc b/docs/notes/quaternions.inc
index ef9d7d6..56093a3 100644
--- a/docs/notes/quaternions.inc
+++ b/docs/notes/quaternions.inc
@@ -210,7 +210,7 @@ directly read out $w$ by multiplying $q^{-1}$ from the right,
 
 $$\begin{align}
 \dot q \circ q^{-1}
-&= {\frac{1}{2}}(0,w) ~,\quad w = 2~ [\dot q \circ q^{-1}]_{1:3} ~.
+&= {\textstyle\frac{1}{2}}(0,w) ~,\quad w = 2~ [\dot q \circ q^{-1}]_{1:3} ~.
 \end{align}$$
 
 However, in the case where $\dot q$ is non-tangential, i.e.,
diff --git a/docs/notes/quaternions.md b/docs/notes/quaternions.md
index a5fedf5..7672ac9 100644
--- a/docs/notes/quaternions.md
+++ b/docs/notes/quaternions.md
@@ -1,12 +1,13 @@
 ---
 layout: home
 title:  "Quaternions, Exponential Map, and Quaternion Jacobians"
-date: 2024-08-25
+date: 2024-03-01
+order: 6
 tags: note
 ---
 
 *[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
-Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
 
 [[pdf version](../pdfs/quaternions.pdf)]
 
diff --git a/docs/notes/robotKin.inc b/docs/notes/robotKin.inc
index b0796fe..d97d3f4 100644
--- a/docs/notes/robotKin.inc
+++ b/docs/notes/robotKin.inc
@@ -356,8 +356,8 @@ spline interpolation, or a basic but nice motion profile).
     $\frac{d}{dt} \frac{\partial L}{\partial\dot q} - \frac{\partial L}{\partial q} = u$,
     where $L(q,\dot q) = T(q,\dot q) - U(q)$ with the system kinetic
     energy
-    $$T(q,\dot q) = \sum_i {\frac{1}{2}}m_i v_i^2 + {\frac{1}{2}}w_i^{\mkern-1pt \top \mkern-1pt}\bar I_i w_i
-    = \sum_i {\frac{1}{2}}\dot q^{\mkern-1pt \top \mkern-1pt}J_i^{\mkern-1pt \top \mkern-1pt}M_i J_i \dot q,~
+    $$T(q,\dot q) = \sum_i {\textstyle\frac{1}{2}}m_i v_i^2 + {\textstyle\frac{1}{2}}w_i^{\mkern-1pt \top \mkern-1pt}\bar I_i w_i
+    = \sum_i {\textstyle\frac{1}{2}}\dot q^{\mkern-1pt \top \mkern-1pt}J_i^{\mkern-1pt \top \mkern-1pt}M_i J_i \dot q,~
     M_i = {\rm diag}(m_i{\rm\bf I}_3, \bar I_i),$$ and the system
     potential energy $U(q) = \sum_i g m_i x_i^\text{z}$. When computing
     the partial derivatives analytically we get something of the form
diff --git a/docs/notes/robotKin.md b/docs/notes/robotKin.md
index bb0e399..5da5909 100644
--- a/docs/notes/robotKin.md
+++ b/docs/notes/robotKin.md
@@ -1,12 +1,13 @@
 ---
 layout: home
 title:  "Robot Kinematics and Dynamics Essentials"
-date: 2024-08-25
+date: 2024-05-01
+order: 5
 tags: note
 ---
 
 *[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
-Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
 
 [[pdf version](../pdfs/robotKin.pdf)]
 
diff --git a/docs/notes/splines.inc b/docs/notes/splines.inc
index 7e7427a..8efc8a1 100644
--- a/docs/notes/splines.inc
+++ b/docs/notes/splines.inc
@@ -85,7 +85,7 @@ $$\begin{align}
 &= \frac{12}{\tau^3}~[(x_1 - x_0)-\frac{\tau}{2}(v_0+v_1)]^2+\frac{1}{\tau}(v_1-v_0)^2 \label{eqLeap}\\
 &= \frac{12}{\tau^3} D^{\mkern-1pt \top \mkern-1pt}D + \frac{1}{\tau} V^{\mkern-1pt \top \mkern-1pt}V ~,\quad D := (x_1 - x_0)-\frac{\tau}{2}(v_0+v_1),~ V:=v_1-v_0,~ \\
 &= \tilde D^{\mkern-1pt \top \mkern-1pt}\tilde D + \tilde V^{\mkern-1pt \top \mkern-1pt}\tilde V ~,\quad
-\tilde D := \sqrt{12}~ \tau^{-\frac{3}{2}}~ D,~ \tilde V := \tau^{-{\frac{1}{2}}}~ V ~,
+\tilde D := \sqrt{12}~ \tau^{-\frac{3}{2}}~ D,~ \tilde V := \tau^{-{\textstyle\frac{1}{2}}}~ V ~,
 \label{eqLeapSOS}
 \end{align}$$
 
@@ -157,8 +157,8 @@ knots, $t_0,..,t_m \in [0,T]$, $t_0=0, t_m=T$, and the waypoints $x_i$
 *and velocities* $v_i$ at each time knot. There are not double knots, so
 the interval $[0,T]$ is split in $m$ cubic pieces, where the $i$th piece
 is determined by the boundary conditions
-$(x_{i{ \textrm{-} \mkern-1pt 1}},v_{i{ \textrm{-} \mkern-1pt 1}}, x_i, v_i)$
-and $\tau_i = t_i - t_{i{ \textrm{-} \mkern-1pt 1}}$.
+$(x_{i\text{-1}},v_{i\text{-1}}, x_i, v_i)$ and
+$\tau_i = t_i - t_{i\text{-1}}$.
 
 Specifying the timings (i.e., knots) and velocities of all waypoints is
 often not easy in terms of a user interface. Therefore the question is
@@ -172,13 +172,13 @@ and the right), and therefore a freely specified Hermite cubic spline is
 discontinuous in acceleration (has infintite jerk). Conversely,
 requiring a path in ${\cal C}^2$ implies continuity constraints in
 acceleration at each knot. Over the full path, these are
-$(m{ \textrm{-} \mkern-1pt 1}) \cdot n$ constraints (in
-${\mathbb{R}}^n$), which "kill" the degrees-of-freedom of all but the
-start and end velocity. Therefore, requiring continuous accelleration,
-the kots, waypoints and start/end velocity alone are sufficient to
-specify the spline &ndash; but in practise the resulting waypoint
-velocities might not be quite desired, as they might "go crazy" when
-chaining forward the continuous acceleration constraint.
+$(m\text{-1}) \cdot n$ constraints (in ${\mathbb{R}}^n$), which "kill"
+the degrees-of-freedom of all but the start and end velocity. Therefore,
+requiring continuous accelleration, the kots, waypoints and start/end
+velocity alone are sufficient to specify the spline &ndash; but in
+practise the resulting waypoint velocities might not be quite desired,
+as they might "go crazy" when chaining forward the continuous
+acceleration constraint.
 
 However, optimizing both, timing and waypoint velocities under out
 optimal control objective is rather efficient and effective. Note that
@@ -192,9 +192,8 @@ the least-squares formulation of $\psi$ we can use the Gauss-Newton
 approximate Hessian.
 
 As a concequence, it is fairly efficient to solve for $\tau_{1:m}$,
-$v_{1:m{ \textrm{-} \mkern-1pt 1}}$ given $v_0, v_m, x_{0:m}$ under
-continuous acceleration constraints subject to total time and control
-costs.
+$v_{1:m\text{-1}}$ given $v_0, v_m, x_{0:m}$ under continuous
+acceleration constraints subject to total time and control costs.
 
 As a final note, in Hermite quintic splines we need positions $x_i$,
 velocities $v_i$ and accelerations $a_i$ at each knot, which describe
@@ -309,10 +308,9 @@ transitioning through these waypoints at the desired times?
 
 The answer is again the matrix equation. Consider the cubic spline case
 and that the start and end points and times are fixed. Therefore
-$z_{0:1}$ and $z_{K{ \textrm{-} \mkern-1pt 1}:K}$, as well as knots
-$t_{0:3}$ and $t_{m-3:m}$ are fixed. The user wants waypoints
-$x_1,..,x_S$ at times $\widehat t_1,..,\widehat t_S$ *between* start and
-end.
+$z_{0:1}$ and $z_{K\text{-1}:K}$, as well as knots $t_{0:3}$ and
+$t_{m-3:m}$ are fixed. The user wants waypoints $x_1,..,x_S$ at times
+$\widehat t_1,..,\widehat t_S$ *between* start and end.
 
 We can distribute $S$ knots $t_{4:3+S}$ uniformly between start and end
 knots (or also at $\widehat t_1,..,\widehat t_S$), from which it follows
@@ -322,8 +320,8 @@ points are still free, and matrix inversion gives them from the desired
 waypoints,
 
 $$\begin{align}
-  z_{2:S+1} = B^{ \textrm{-} \mkern-1pt 1} x_{1:S} ~,\quad\text{with } B \in {\mathbb{R}}^{S \times S},~ B_{si} =  B_{i+1,3}(\widehat t_s),~ s,i=1,..,S  ~.
-  \end{align}$$
+  z_{2:S+1} = B^\text{-1} x_{1:S} ~,\quad\text{with } B \in {\mathbb{R}}^{S \times S},~ B_{si} =  B_{i+1,3}(\widehat t_s),~ s,i=1,..,S  ~.
+\end{align}$$
 
 ### Ensuring boundary velocities
 
@@ -333,11 +331,11 @@ B-spline representation we have to construct a spline that starts with
 current state as starting boundary.
 
 For degrees 2 and 3 this is simple to achieve: In both cases we usually
-have $z_0=z_1$ and $z_{K{ \textrm{-} \mkern-1pt 1}}=z_K$ to ensure zero
-start and end velocities. Modifying $z_1$ directly leads to the start
-velocity $\dot x(0) = \dot B_{0,p}(0) z_0 + \dot B_{1,p}(0) z_1$. But
-because of normalization we have $\dot B_{0,p}(0) = - \dot B_{1,p}(0)$,
-and therefore
+have $z_0=z_1$ and $z_{K\text{-1}}=z_K$ to ensure zero start and end
+velocities. Modifying $z_1$ directly leads to the start velocity
+$\dot x(0) = \dot B_{0,p}(0) z_0 + \dot B_{1,p}(0) z_1$. But because of
+normalization we have $\dot B_{0,p}(0) = - \dot B_{1,p}(0)$, and
+therefore
 
 $$\begin{align}
   \dot x(0) &= \dot B_{0,p}(0) (z_0 - z_1) \\
@@ -356,10 +354,10 @@ B_{i,p}(t)
  +  \frac{t_{i+p+1}-t}{t_{i+p+1}-t_{i+1}} B_{i+1,p-1}(t) \\
 &=: v~ B_{i,p-1} + w~ B_{i+1,p-1} \\
 \dot B_{i,p}(t)
- &= \frac{1}{t_{i \textrm{+} \mkern-1pt {}p}-t_i}~ B_{i,p{ \textrm{-} \mkern-1pt 1}}(t)
- + v~ \dot B_{i,p{ \textrm{-} \mkern-1pt 1}}(t)
- - \frac{1}{t_{i \textrm{+} \mkern-1pt {}p{ \textrm{+} \mkern-1pt 1}}-t_{i{ \textrm{+} \mkern-1pt 1}}}~ B_{i{ \textrm{+} \mkern-1pt 1},p{ \textrm{-} \mkern-1pt 1}}(t)
- + w~ \dot B_{i{ \textrm{+} \mkern-1pt 1},p{ \textrm{-} \mkern-1pt 1}}(t) \\
+ &= \frac{1}{t_{i \textrm{+} \mkern-1pt {}p}-t_i}~ B_{i,p\text{-1}}(t)
+ + v~ \dot B_{i,p\text{-1}}(t)
+ - \frac{1}{t_{i \textrm{+} \mkern-1pt {}p{ \textrm{+} \mkern-1pt 1}}-t_{i{ \textrm{+} \mkern-1pt 1}}}~ B_{i{ \textrm{+} \mkern-1pt 1},p\text{-1}}(t)
+ + w~ \dot B_{i{ \textrm{+} \mkern-1pt 1},p\text{-1}}(t) \\
 \partial_{t_i} B_{i,p}
   &= \Big[\frac{-1}{t_{i+p}-t_i} + \frac{t-t_i}{(t_{i+p}-t_i)^2}\Big]~ B_{i,p-1}
    + v~ \partial_{t_i} B_{i,p-1} + w~ \partial_{t_i} B_{i+1,p-1} \\
diff --git a/docs/notes/splines.md b/docs/notes/splines.md
index 007ba3e..6404170 100644
--- a/docs/notes/splines.md
+++ b/docs/notes/splines.md
@@ -2,11 +2,12 @@
 layout: home
 title:  "Splines: Cubic, Hermite, Timing-Optimal, B-Splines, Derivatives"
 date: 2024-08-27
+order: 5
 tags: note
 ---
 
 *[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
-Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
 
 [[pdf version](../pdfs/splines.pdf)]
 
diff --git a/docs/notes/svd.inc b/docs/notes/svd.inc
index 54dc10b..cadb128 100644
--- a/docs/notes/svd.inc
+++ b/docs/notes/svd.inc
@@ -34,10 +34,11 @@ orthonormal vectors $v_1,..,v_k \in {\mathbb{R}}^n$, orthonormal vectors
 $u_1,..,u_k\in{\mathbb{R}}^m$, and scalar numbers $\sigma_k>0$, such
 that
 
-$$\begin{aligned}
+$$\begin{align}
 A = \sum_{i=1}^k  u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}= U S V^{\mkern-1pt \top \mkern-1pt}
 ~,\quad\text{where}\quad S = {\rm diag}(\sigma_{1:k}),~ U=u_{1:k}\in{\mathbb{R}}^{m\times k},~ V=
-v_{1:k} \in {\mathbb{R}}^{n\times k} ~.\end{aligned}$$
+v_{1:k} \in {\mathbb{R}}^{n\times k} ~.
+\end{align}$$
 
 In this form, we see that $V^{\mkern-1pt \top \mkern-1pt}$ spans the
 input space with orthonormal rows $v_i^{\mkern-1pt \top \mkern-1pt}$,
diff --git a/docs/notes/svd.md b/docs/notes/svd.md
index 7fbe55a..029cf61 100644
--- a/docs/notes/svd.md
+++ b/docs/notes/svd.md
@@ -1,12 +1,13 @@
 ---
 layout: home
 title:  "Singular Value Decomposition"
-date: 2024-08-23
+#date: 2024-08-23
+order: 7
 tags: note
 ---
 
 *[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
-Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+Intelligent Systems Lab, TU Berlin,* {{ page.date  | date: '%B, %Y' }}
 
 [[pdf version](../pdfs/svd.pdf)]
 
diff --git a/notes/energy.tex b/notes/energy.tex
index 6beeab2..ed2c850 100644
--- a/notes/energy.tex
+++ b/notes/energy.tex
@@ -15,17 +15,13 @@
 \subsection{Probabilities \& Energy}
 
 Given a density $p(x)$, we call
-
 \begin{align}
 E(x) = -\log p(x) + c ~,
 \end{align}
-
 (for any choice of offset $c\in\RRR$) an energy function. Conversely, given an energy function $E(x)$, the corresponding density (called Boltzmann distribution) is
-
 \begin{align}
 p(x) = \Frac 1Z \exp\lbrace-E(x)\rbrace ~,
 \end{align}
-
 where $Z$ is the normalization constant to ensure $\int_x p(x) = 1$,
 and $c=-\log Z$. From the perspective of physics, one can motivate why
 $E(x)$ is called ``energy'' and derive these relations from other
@@ -33,17 +29,13 @@ \subsection{Probabilities \& Energy}
 relation more minimalistically as follows:
 
 Probabilities are \emph{multiplicative} (axiomatically): E.g., the likelihood of i.i.d.\ data $D = \lbrace x_i \rbrace_{i=1}^n$ is the product
-
 \begin{align}
 P(D) = \prod_{i=1}^n p(x_i) ~.
 \end{align}
-
 We often want to rewrite this with a log to have an \emph{additive} expression
-
 \begin{align}
 E(D) = -\log P(D) = \sum_{i=1}^n E(x_i) \comma E(x_i) = -\log p(x_i) ~.
 \end{align}
-
 The minus is a convention so that we can call the quantity $E(D)$
 a \emph{loss} or \emph{error} -- something we want to minimize instead
 of maximize. We can show that whenever we want to define a
@@ -59,7 +51,6 @@ \subsection{Probabilities \& Energy}
 properties (1) $P(D)$ is multiplicative, $P(D_1 \cup D_2) = P(D_1)
 P(D_2)$, (2) $E(D)$ is additive $E(D_1\cup D_2) = E(D_1) + E(D_2)$,
 and (3) there is a mapping $P(D) = \Frac 1Z f(E(D))$ between both. Then it follows that
-
 \begin{align}
 P(D_1\cup D_2)
 &= P(D_1)~ P(D_2) = \Frac 1{Z_1} f(E(D_1))~ \Frac 1{Z_2} f(E(D_2)) \\
@@ -68,16 +59,13 @@ \subsection{Probabilities \& Energy}
 \To\quad \Frac 1{Z_1} f(E_1) \Frac 1{Z_2} f(E_2)
 &= \Frac 1{Z_0} f(E_1+E_2) ~, \label{exp}
 \end{align}
-
 where we defined $E_i=E(D_i)$. The only function to fulfill
 the last equation for any $E_1,E_2\in\RRR$ is the exponential function $f(E) = \exp\lbrace -\b E \rbrace$ with arbitrary
  coefficient $\b$ (and minus sign being a convention, $Z_0 = Z_1
  Z_2$). Boiling this down to an individual element $x\in D$, we have
-
 \begin{align}
 p(x) = \Frac 1Z \exp\lbrace-\b E(x)\rbrace \comma \b E(x) = -\log p(x) - \log Z ~.
 \end{align}
-
 \end{proof}
 
 }
diff --git a/notes/gaussians.tex b/notes/gaussians.tex
new file mode 100644
index 0000000..eb5a796
--- /dev/null
+++ b/notes/gaussians.tex
@@ -0,0 +1,349 @@
+\input{../latex/shared}
+\note[9pt]
+
+\title{Lecture Note:\\ Gaussian Identities}
+\author{Marc Toussaint\\\small Learning \& Intelligent Systems Lab, TU Berlin}
+\date{January 25, 2011}
+
+\makeatletter
+\renewcommand{\@seccntformat}[1]{}
+\makeatother
+
+\renewcommand{\mT}{{\text{-}\!\top}}
+\renewcommand{\-}{\!-\!}
+\newcommand{\+}{\!+\!}
+
+\notetitle
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+\subsection{Definitions}
+
+A Gaussian over $x\in\RRR^n$ with mean $a\in\RRR^n$
+and sym.pos.dev.\ covariance matrix $A\in\RRR^{n\times n}$ is defined as:
+
+\begin{align}
+\NN(x \| a,A) &= \frac{1}{|2\pi A|^{1/2}}~ \exp\{-\half (x-a)^\T
+A^\1 (x-a)\} ~.
+\end{align}
+
+We also define a notation for its so-called \emph{canonical form},
+with sym.pos.def.\ precision matrix $A\in\RRR^{n\times n}$, as
+\begin{align}
+\NN[x \| a,A]
+ = \frac{\exp\{-\half a^\T A^\1a\}}{|2\pi A^\1|^{1/2}}~
+   \exp\{-\half x^\T A x + x^\T a\} ~.
+\end{align}
+It holds
+\begin{align}
+& \NN[x \| a,A] = \NN(x \| A^\1 a, A^\1) \comma
+ \NN(x \| a,A) = \NN[x \| A^\1 a, A^\1] ~.
+\end{align}
+
+%% Non-normalized Gaussian
+%% \begin{align}
+%% \oNN(x,a,A)
+%%  &= |2\pi A|^{1/2}~ \NN(x|a,A) \\
+%%  &= \exp \{-\half (x-a)^\T A^\1 (x-a)\}
+%% \end{align}
+
+
+\subsection{Matrix Identities}
+
+As a background, here are matrix identities (based on
+the \href{https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf}{matrix-cookbook})
+which are useful work with Gaussians:
+
+\newcommand{\ve}[2]{\left[\arr{c}{#1\\#2}\right]}
+\newcommand{\ma}[4]{\left[\arr{cc}{#1\\#3}\right]}
+\renewcommand{\de}[4]{\left|\arr{cc}{#1\\#3}\right|}
+\renewcommand{\bar}{\widehat}
+
+\begin{align}
+(A^\1 + B^\1)^\1 &= A~ (A\+B)^\1~ B = B~ (A\+B)^\1~ A \\
+(A^\1 - B^\1)^\1 &= A~ (B\-A)^\1~ B \\
+\del_x |A_x| &= |A_x|~ \tr(A_x^\1~ \del_x A_x) \\
+\del_x A_x^\1 &= - A_x^\1~ (\del_x A_x)~ A_x^\1 \\
+(A+UBV)^\1 &= A^\1 - A^\1 U (B^\1 + VA^\1U)^\1 V A^\1 \label{wood}\\
+(A^\1+B^\1)^\1 &= A - A (B + A)^\1 A \\
+(A + J^\T B J)^\1 J^\T B 
+&= A^\1 J^\T (B^\1 + J A^\1 J^\T)^\1 \label{wood2}\\
+(A + J^\T B J)^\1 A
+&= \Id - (A + J^\T B J)^\1 J^\T B J  \label{null}
+\end{align}
+
+\eqref{wood}=Woodbury; \eqref{wood2},\eqref{null} hold for pos.def.\ $A$ and $B$
+
+\subsection{Derivatives}
+
+\begin{align}
+\del_x \NN(x|a,A) &= \NN(x|a,A)~ (-h^\T)  \comma h:= A^\1(x-a)\\
+\del_\t \NN(x|a,A)
+&= \NN(x|a,A)~ \[- h^\T(\del_\t x) 
+   + h^\T (\del_\t a)
+   - \half \tr(A^\1~ \del_\t A)
+   + \half h^\T (\del_\t A) h \] \\
+\del_\t \NN[x|a,A]
+& = \NN[x|a,A]~ \[ -\half x^\T \del_\t A x + \half a^\T A^\1 \del_\t A A^\1 a 
++ x^\T \del_\t a - a^\T A^\1 \del_\t a + \half\tr(\del_\t A A^\1) \]
+\end{align}
+%%  \\
+%% & \del_\t \oNN_x(a,A) = \oNN_x(a,A) ~\cdot \feed
+%% & \[ h^\T(\del_\t x)
+%%    + h^\T (\del_\t a)
+%%    + \half h^\T (\del_\t A) h \]
+
+\subsection{Product}
+
+The product of two Gaussians can be expressed as
+
+\begin{align}
+\NN(x \| a,A)~ \NN(x \| b,B)
+ &= \NN[x \| A^\1 a+B^\1 b, A^\1 + B^\1]~ \NN(a\|b,A+B) ~, \label{prodNat}\\
+ &= \NN(x \| B(A\+B)^\1a + A(A\+B)^\1b ,A(A\+B)^\1B)~ \NN(a\|b,A+B) ~,\\
+\NN[x \| a,A]~ \NN[x \| b,B]
+ &= \NN[x \| a+b,A+B]~ \NN(A^\1 a \| B^\1 b, A^\1+B^\1) \\
+ &= \NN[x \| a+b,A+B]~ \NN[ A^\1 a \| A(A\+B)^\1 b, A(A\+B)^\1 B]\\
+ &= \NN[x \| a+b,A+B]~ \NN[ A^\1 a \| (1\-B(A\+B)^\1)~ b,~ (1\-B(A\+B)^\1)~ B] ~,\\
+\NN(x \| a,A)~ \NN[x \| b,B]
+ &= \NN[x \| A^\1 a+ b, A^\1 + B]~ \NN(a\|B^\1 b,A+B^\1) \\
+ &= \NN[x \| A^\1 a+ b, A^\1 + B]~ \NN[a\|(1\-B(A^\1\+B)^\1)~ b,~ (1\-B(A^\1\+B)^\1)~
+ B] \label{prodNatCan}
+\end{align}
+
+\subsection{Convolution}
+\begin{align}
+\textstyle\int_x \NN(x \|a,A)~ \NN(y-x \| b,B)~ dx
+ &= \NN(y \| a+b, A+B)
+\end{align}
+
+\subsection{Division}
+
+\begin{align}
+\NN(x|&a,A) ~\big/~ \NN(x|b,B) = \NN(x|c,C) ~\big/~ \NN(c| b, C+B) \comma C^\1c = A^\1a - B^\1b,~ C^\1 = A^\1 - B^\1 \\
+\NN[x|&a,A] ~\big/~ \NN[x|b,B] \propto \NN[x|a-b,A-B]
+\end{align}
+
+\subsection{Expectations}
+
+Let $x\sim\NN(x \| a,A)$, we have:
+\begin{align}
+&\Exp[x]{g(x)} := \textstyle\int_x \NN(x \| a,A)~ g(x)~ dx \\
+%&\Exp[x]{g(f+Fx)} = 
+&\Exp[x]{x} = a \comma \Exp[x]{x x^\T} = A + a a^\T\\
+&\Exp[x]{f+Fx} = f+Fa \\
+&\Exp[x]{x^\T x} = a^\T a + \tr(A)\\
+&\Exp[x]{(x-m)^\T R(x-m)} = (a-m)^\T R(a-m) + \tr(RA)
+\end{align}
+
+\subsection{Linear Transformation}
+
+For any $f\in\RRR^n$ and full-rank $F\in\RRR^{n\times n}$, the
+following identities hold:
+\begin{align}
+\NN(x\|a,A) &= \NN(x+f\|a+f,~A) \\
+\NN(x\|a,A) &= |F|~ \NN(Fx \| Fa,~FAF^\T) \\
+\NN(F x + f \| a,A)
+&= \frac{1}{|F|}~ \NN(x \| ~ F^\1 (a-f),~ F^\1 AF^\mT)
+ = \frac{1}{|F|}~ \NN[x \| ~ F^\T A^\1 (a-f),~ F^\T A^\1 F] ~, \\
+\NN[F x + f \| a,A]
+&= \frac{1}{|F|}~ \NN[x \| ~ F^\T(a-Af),~ F^\T A F] ~.
+\end{align}
+
+\subsection{``Propagation''}
+
+Propagating a message along a linear coupling (e.g.\ forward model), using
+ eqs \eqref{prodNat} and \eqref{prodNatCan}, respectively, we have:
+\begin{align}
+& \textstyle\int_y \NN(x \| a + Fy, A)~ \NN(y \| b, B)~ dy
+ = \NN(x \| a + Fb, A+FBF^\T) \\
+& \textstyle\int_y \NN(x \| a + Fy, A)~ \NN[y \| b, B]~ dy
+ = \NN[x \| (F^\mT\-K)(b+BF^\1a),~ (F^\mT\-K)BF^\1] ~,
+\end{align}
+where $K=F^\mT B(F^\mT A^\1 F^\1\+B)^\1$.
+
+%\begin{align}
+%& x' = F x + f \\
+%& \NN(x|a,A) = |F|~ \NN(Fx+f|~ Fa+f,~ FAF^\T) \\
+%& \NN(F x + f|a,A) = \frac{1}{|F|}~ \NN(x|~ F^\1(a-f),~ F^\1AF^{-1T}) \\
+%& \NN[F x + f|a,A] = \frac{1}{|F|}~ \NN[x|~ F^\T(a-Af),~ F^\T A F] \\
+%& P(x) = |F|~ P(x'=F x + f) \comma P(x') = \frac{1}{|F|}~ P(x=F^\1(x' - f))
+%\end{align}
+%If a forward dependency $P(y|x)$ is given as a linear noise transition
+%$(f,F,Q)$ and if evidence $y^*$ is given, this induces a potential on
+%$x$:
+%\begin{align}
+%\NN(y| Fx+f,Q) = \NN(Fx+f| y,Q)
+% = U(x) \propto \NN(x|F^\1(y-f),~ F^\1 Q F^{-1T})
+%\end{align}
+
+\subsection{Marginal \& Conditional}
+\begin{align}
+\NN(x \| a,A)~ \NN(y \| b+Fx,B)
+ &= \NN\bigg( \arr{c}{x\\ y} \bigg| \arr{c}{a\\ b+Fa} ,~
+             \arr{cc}{A & A^\T F^\T\\ F A & B\!+\!F A^\T F^\T} \bigg) \\
+%
+\NN\bigg( \arr{c}{x\\ y} \bigg| \arr{c}{a\\ b} ,~ 
+         \arr{cc}{A & C\\ C^\T & B} \bigg)
+&= \NN(x \| a,A) \cdot \NN(y \| b+C^\T A^\1(x-a),~ B - C^\T A^\1 C) \\
+%
+\NN[ x \| a,A ]~ \NN(y \| b+Fx,B )
+ &= \NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{a+F^\T B^\1 b \\ B^\1 b} ,~
+             \arr{cc}{A+F^\T B^\1 F & -F^\T B^\1 \\ -B^\1 F & B^\1} \bigg] \\
+%
+\NN[x \| a,A ]~ \NN[y \| b+Fx,B ]
+ &= \NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{a+F^\T B^\1 b \\ b} ,~
+             \arr{cc}{A+F^\T B^\1  F & -F^\T \\ -F & B} \bigg] \\
+%
+\NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{a\\ b} ,~ 
+         \arr{cc}{A & C\\ C^\T & B} \bigg]
+&= \NN[x \| a - C B^\1 b,~ A - C B^\1 C^\T] \cdot \NN[y \| b-C^\T x,B] \\
+\de{A}{C}{D}{B}
+ &= |A|~ |\bar B| = |\bar A|~ |B| ~,
+ \text{where } \arr{l}{ \bar A = A - C B^\1 D \\ \bar B = B - D A^\1 C }\\
+\ma{A}{C}{D}{B}^\1
+ &= \ma{\bar A^\1}{-A^\1 C \bar B^\1}{-\bar B^\1 D A^\1}{\bar B^\1}
+ = \ma{\bar A^\1}{-\bar A^\1 C B^\1}{-B^\1 D \bar A^\1}{\bar B^\1}
+\end{align}
+
+\subsection{Pair-wise Belief}
+
+We have a message $\a(x)=\NN[x \| s,S]$,
+ transition $P(y|x) = \NN(y \| A x+a,Q)$, and a message
+ $\b(y)=\NN[y \| v,V]$, what is the belief $b(y,x)=\a(x)P(y|x)\b(y)$?
+\begin{align}
+b(y,x)
+ &= \NN[x|s,S]~ \NN(y|A x+a,Q^\1)~ \NN[y|v,V] \\
+&=
+\NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{s \\ 0} ,~
+             \arr{cc}{S & 0 \\ 0 & 0} \bigg]~~~
+\NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{A^\T Q^\1 a \\ Q^\1 a} ,~
+             \arr{cc}{A^\T Q^\1 A & -A^\T Q^\1 \\ -Q^\1 A & Q^\1} \bigg]~~~
+\NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{0 \\ v} ,~
+             \arr{cc}{0 & 0 \\ 0 & V} \bigg] \\
+&\propto
+\NN\bigg[ \arr{c}{x\\ y} \bigg| \arr{c}{s + A^\T Q^\1 a\\ v + Q^\1 a} ,~
+             \arr{cc}{S + A^\T Q^\1 A & -A^\T Q^\1 \\ -Q^\1 A & V+Q^\1} \bigg]
+\end{align}
+
+\subsection{Entropy}
+\begin{align}
+H(\NN(a,A)) &= \half \log |2\pi e A|
+\end{align}
+
+\subsection{Kullback-Leibler divergence}
+
+For $p=\NN(x|a,A),~ q=\NN(x|b,B), n = \text{dim}(x)$ and definition
+$\kld{p}{q} = \sum_x p(x) \log\frac{p(x)}{q(x)}$, we have:
+\begin{align}
+2~ \kld{p}{q}
+&= \log\frac{|B|}{|A|} + \tr(B^\1A) + (b-a)^\T B^\1 (b-a) - n \\
+4~ D_\text{sym}\big(p \,\big\Vert\, q\big)
+&= \tr(B^\1A) + \tr(A^\1B) + (b-a)^\T (A^\1+B^\1) (b-a) - 2n
+\end{align}
+
+$\l$-divergence:
+\begin{align}
+2~ D_\l\big(p \,\big\Vert\, q\big)
+&= \l~ \kld{p}{\l p+(1\!-\!\l)q} ~+~ (1\!-\!\l)~ \kld{p}{(1\!-\!\l) p + \l q}
+\end{align}
+
+For $\l=.5$: Jensen-Shannon divergence.
+
+\subsection{Log-likelihoods}
+\begin{align}
+\log \NN(x|a,A)
+ &= - \half \[ \log|2\pi A| + (x-a)^\T A^\1 (x-a) \] \\
+\log \NN[x|a,A]
+ &= - \half \[ \log|2\pi A^\1| + a^\T A^\1 a + x^\T A x - 2 x^\T a \] \\
+\sum_x \NN(x|b,B) \log \NN(x|a,A)
+ &= -\kld{\NN(b,B)}{\NN(a,A)} - H(\NN(b,B))
+\end{align}
+
+\subsection{Mixture of Gaussians}~
+Collapsing a MoG into a single Gaussian
+\begin{align}
+&\argmin_{b,B} \kld{\sum_i p_i~ \NN(a_i,A_i)}{\NN(b,B)}
+\quad=\quad\(
+b=\sum_i p_i a_i ~,~
+B=\sum_i p_i (A_i + a_i a_i^\T - b\, b^\T)\)
+\end{align}
+
+%% THAT STUFF IS WRONG!!
+%% Marginal of a MOG
+%% \begin{align}
+%% &P(x,y) = \sum_i p_i~ \NN(\ve{x}{y}| \ve{a_i}{b_i},\ma{A_i}{C_i}{C_i^\T}{B_i})
+%% \feed
+%% &P(y|x) = \sum_i p_i~ \NN(y| b_i + C_i^\T A_i^\1(x-a_i),~ B_i - C_i^\T
+%% A_i^\1 C_i^\T) \\
+%% &\approx \NN(y|e,E) \comma e=\sum_i p_i (b_i + C_i^\T A_i^\1(x-a_i))
+%% \comma \feed 
+%% &E = \sum_i p_i \[ B_i - C_i^\T
+%% A_i^\1 C_i^\T + b_i b_i^\T + C_i^\T A_i^\1(x-a_i)(x-a_i)^\T
+%% A_i^\1{}^\T C_i + 2 C_i^\T A_i^\1(x-a_i) b^\T - e e^\T\] \\
+%% & F = - \sum_i p_i~ C_i^\T A_i^\1 \comma
+%%   f =   \sum_i p_i~ (b_i - C_i^\T A_i^\1 a_i) \comma
+%%   Q = ? \\
+%% %
+%% &P(x,y) = \sum_i p_i~ \NN[\ve{x}{y}| \ve{a_i}{b_i},\ma{A_i}{C_i}{C_i^\T}{B_i}]
+%% \feed
+%% &P(y|x) = \sum_i p_i~ \NN[y| b_i - C_i^\T x,~ B_i ] \\
+%% &\approx \NN(y|e,E) \comma
+%%  E=\sum_i p_i (B_i^\1 + B^\1 (b_i - C_i^\T x)(b_i - C_i^\T x)^\T
+%%  B^\1{}^\T - e\, e^\T) \comma \feed 
+%% &e = \sum_i p_i~ \[ B_i^\1 (b_i - C_i^\T x) \] \\
+%% & F = - \sum_i p_i~ B_i^\1 C_i^\T \comma
+%%   f =   \sum_i p_i~ B_i^\1 b_i \comma
+%%   Q = ?
+%% \end{align}
+
+%\subsection{Kalman filter (fwd) equations}
+%\begin{align}
+%& x'|x \sim \{F~x+f,~ Q\} \comma y|x \sim \{C~x,~ R\} \\
+%& x'|y',x \sim \{F~x+f, (Q^\1 + C^\T R^\1 C)^\1 C^\T R^\1
+%(y'-C \hat x)\}
+%\end{align}
+
+%\subsection{linear fwd-bwd equations without observations}
+%\begin{align}
+%& \a_t(x) = \NN(a_t,A_t) = P(x | \text{start})
+%  \comma \b(x) = \oNN(b_t,B_t) = P(\text{goal}|x) \\ 
+%& a_{t+1} = F~ a_t + f
+%  \comma A_{t+1} = F A_t F^\T + Q \\
+%& b_{\tau+1} = F^\1(b_t-f)
+%  \comma B_{\tau+1} = F^\1 (B_\tau + Q) F^{-1T} \\
+%&\text{( truely: $\b_{\tau+1} = \frac{1}{|F|}~ \NN(b_\tau,B_\tau)$ )}
+%\end{align}
+
+%\subsection{non-linear fwd-bwd equations without observations}
+%\begin{align}
+%&P(x'|x) = \NN(x'| \phi(x),Q) \\
+%&\a_t(x) = \NN(x|a_t,A_t) \comma (a_t,A_t) = UT_\phi(a_{t-1},A_{T-1}) + (0,Q) \\
+%&\b_t(x) = \oNN_x(b_t,B_t) \comma (b_t,B_t) =
+% UT_{\phi-1}(b_{t-1},B_{T-1} + Q) \\
+%&Z_{t,\tau} = |2\pi B_\tau|^{1/2}~ \NN(a_t|b_\tau,A_t + B_\tau) \cdot
+%\NN(\cdots?\cdots)\\
+%&\g_{t,\tau}(x) = \NN(x|c_t,C_t)
+% \comma C_{t,\tau} = A_t~ (A_t + B_\tau)^\1~ B_\tau
+% \comma c_{t,\tau} = C_{t,\tau}~ (A_t^\1~ a_t + B_\tau^\1~ b_\tau) ~.
+%\end{align}
+
+%\subsection{action selection - simple control}
+%\begin{align}
+%& P(y|x,u) = \NN(x+u,Q) \\
+%& P(r \| u,x) = P(r\|u) + (1-P(r\|u))~ P(r\|x) ??\\
+%& q_\tau(x,u) = \textstyle\int_y \NN(y|x+u,Q)~ \oNN_y(b,B)
+% = |2\pi B|^{1/2}~ \NN(u| b-x, B + Q) \\
+%& \argmax{u} q_\tau(x,u) = b-x
+%\end{align}
+
+%\subsection{action selection - noisy control}
+%\begin{align}
+%& P(y|x,u) = \NN(x+u,u^2 Q) \\
+%& q_\tau(x,u) = \textstyle\int_y \NN(y|x+u,Q)~ \NN(y|b,B) = \NN(u| b-x, B + u^2 Q) \\
+%%& \argmax{u} q_\tau(x,u) = b-x
+%\end{align}
+
+%[[todo: unscented transform]
+
+\end{document}
diff --git a/notes/quaternions.tex b/notes/quaternions.tex
index d691a40..fa27487 100644
--- a/notes/quaternions.tex
+++ b/notes/quaternions.tex
@@ -51,7 +51,7 @@ \section{Reference}
 entries by $\bar q\equiv q_{1:3} \in\RRR^3$. Let's first summarize
 some basic properties for a normalized quaternion:
 
-$$\begin{align}
+\begin{align}
 \text{rotation angle $\t\in\RRR$:} \quad \label{eqAngle}
 & \t = 2 \acos(q_0) = 2\, \text{asin}(|\bar q|) \\
 \text{normalized axis $\ul w\in S^2$:} \quad \label{eqAxis}
@@ -80,12 +80,12 @@ \section{Reference}
     q_0 \bar q' + q'_0 \bar q + \bar q \times \bar q') \\
 \text{vector application, $x\in\RRR^3$:} \quad \label{eqApp}
 & q \cdot x = (q \circ (0,x) \circ q^{-1})_{1:3} \quad\text{(or via matrix)}
-\end{align}$$
+\end{align}
 
-Eqs.~$\eqref{eqLog}$ and $\eqref{eqExp}$ follow from $\eqref{eqAngle}$
-and $\eqref{eqAxis}$, but why they are defined as 'log' and 'exp' will become clear below. Concerning the application of a rotation quaternion $q$ on a vector
-$x$, equation $\eqref{eqApp}$ is elegant, but computationally less
-efficient than first converting to matrix using $\eqref{eqMatrix}$ and
+Eqs.~\eqref{eqLog} and \eqref{eqExp} follow from \eqref{eqAngle}
+and \eqref{eqAxis}, but why they are defined as 'log' and 'exp' will become clear below. Concerning the application of a rotation quaternion $q$ on a vector
+$x$, equation \eqref{eqApp} is elegant, but computationally less
+efficient than first converting to matrix using \eqref{eqMatrix} and
 then multiplying to the vector. Note that concatenating quaternions as
 well as applying a quaternion to a vector never requires to compute
 a $\sin$ or $\cos$, as opposed, e.g. to the Rodriguez' equation
@@ -95,38 +95,38 @@ \section{Continuously moving from $I$ to $q$ -- exponential and log mappings}
 
 Given a quaternion $q$, how can we continuously move from the $\Id=(1,\bar
 0)$ to $q$? The answer is simply given by continuously increasing the
-rotation angle $\eqref{eqAngle}$ from $0$ to $\t$. Let $t\in[0,\t]$ be
+rotation angle \eqref{eqAngle} from $0$ to $\t$. Let $t\in[0,\t]$ be
 the interpolation coefficient, we have
 
-$$\begin{align}
+\begin{align}
 q(t)
 &= (\cos(t/2),~ \sin(t/2)~ \ul w) ~. \label{eqExp0}
-\end{align}$$
+\end{align}
 
 This ``motion'' from $\Id$ to $q$ is a geodesic (=shortest path) on
 $S^3$ (w.r.t. to the natural embedded Euclidean metric of $\RRR^4$),
 and it has constant absolute velocity:
 
-$$\begin{align}\label{eqVel}
+\begin{align}\label{eqVel}
 \dot q(t)
 &= (-\sin(t/2),~ \cos(t/2)~ \ul w)\comma
 |\dot q(t)| = \Frac12 ~.
-\end{align}$$
+\end{align}
 
 At the origin (for $t=0$), the velocity is
 $\dot q(0) = \Frac12 (0, \ul w)$, which shows
  how the rotation axis $\ul w$ provides the tangent vector at
-$\Id$. For $t>0$ the velocity $\eqref{eqVel}$
+$\Id$. For $t>0$ the velocity \eqref{eqVel}
 ``rotates along the tangent arc'' on $S^3$. We can show that $\dot q(t)$
 can also be written as (using that $\ul w$ and $\bar q(t)$ are colinear):
 
-$$\begin{align}
+\begin{align}
 \Frac12 (0, \ul w) \circ q(t)
 &= \Frac12 (0 q_0(t) - \ul w^\T \bar q(t),~ 0 \bar q(t) + q_0(t) \ul w + \ul w \times \bar q(t)) \\
 &= \Frac12 (0 - \sin(t/2),~ \cos(t/2)~ \ul w + 0) ~=~ \dot q(t) ~.  \label{eqDotq}
-\end{align}$$
+\end{align}
 
-Eq.~$\eqref{eqDotq}$ is a differential equation of the form $\dot q(t)
+Eq.~\eqref{eqDotq} is a differential equation of the form $\dot q(t)
 = \a q(t)$, but with the multiplication being the group operation $\circ$. The solution
 $q(t)$ to this differential equation is defined as the exponential map.\footnote{When describing $\SO(3)$ as a Lie groups,
 analogous equations appear when tangent motion is integrated to
@@ -134,19 +134,19 @@ \section{Continuously moving from $I$ to $q$ -- exponential and log mappings}
 $\exp(\cdot)$ of a tangent vector.} Namely, for a vector $w=t\ul w \in
 \RRR^3$ we define
 
-$$\begin{align}
+\begin{align}
 \exp(t \ul w)
 &= (\cos(t/2),~ \sin(t/2)~ \ul w) \comma
 \To~ \Del t \exp(t \ul w)
  = \Frac12 (0, \ul w) \circ \exp(t \ul w) ~,
-\end{align}$$
+\end{align}
 
-fully consistent to $\eqref{eqExp0}$ and $\eqref{eqDotq}$. Conversely, given a quaternion, we define the log as
+fully consistent to \eqref{eqExp0} and \eqref{eqDotq}. Conversely, given a quaternion, we define the log as
 
-$$\begin{align}
+\begin{align}
 \log(q)
 &= 2 \acos(q_0)~ \Frac{\bar q}{|\bar q|} ~.
-\end{align}$$
+\end{align}
 
 \section{Continuously moving from $q_A$ to $q_B$ -- interpolation in quaternion space}
 
@@ -161,9 +161,9 @@ \section{Continuously moving from $q_A$ to $q_B$ -- interpolation in quaternion
 $q_B$. Let $t\in[0,1]$. The \emph{proper} interpolation uses the exponential
 map:
 
-$$\begin{align}
+\begin{align}
 q(t) = \exp(t\log(q_B/q_A)) \circ q_A ~. \label{eqInter}
-\end{align}$$
+\end{align}
 
 This first computes the relative transform $q_{BA} = q_B/q_A$,
 computes its tangent vector representation $\log(q_{BA})$, and interpolates
@@ -171,18 +171,18 @@ \section{Continuously moving from $q_A$ to $q_B$ -- interpolation in quaternion
 $q_A$ from the right, making all this relative to orientation
 $q_A$. For completeness, the velocity is
 
-$$\begin{align}
+\begin{align}
 \dot q(t)
 &= \[\Del t \exp(t\log(q_B/q_A))\] \circ q_A \\
 &= \[\Frac12 (0, w_{BA}) \circ \exp(t\log(q_B/q_A))\] \circ q_A \\
 &= \Frac12 (0, w_{BA}) \circ q(t) ~. \label{eqInterVel}
-\end{align}$$
+\end{align}
 
 As an alternative, we can define the \emph{embedded} interpolation as
 
-$$\begin{align}
+\begin{align}
 q(t) = \nor((1-t)~ q_A + t~ q_B) ~, \label{eqLinInter}
-\end{align}$$
+\end{align}
 
 where $\nor(q) = q / |q|$. This assumes that the two
 quaternions are \emph{sign aligned}, i.e. $q_A^\T q_B \ge 0$. This
@@ -201,8 +201,8 @@ \section{Continuously moving from $q_A$ to $q_B$ -- interpolation in quaternion
 
 The point here is that interpolation of rotations is very
 intuitive: ``straight'' interpolation on the sphere $S^3$ which can
-equally described using $\eqref{eqLinInter}$ (with the exponential
-map) or the naive embedded interpolation $\eqref{eqLinInter}$. But only the proper
+equally described using \eqref{eqLinInter} (with the exponential
+map) or the naive embedded interpolation \eqref{eqLinInter}. But only the proper
 interpolation with the exponential map has constant absolute velocity.
 
 \section{Angular Jacobian w.r.t. Quaternion Parameters}
@@ -211,13 +211,13 @@ \section{Angular Jacobian w.r.t. Quaternion Parameters}
 optimization problem and need a Jacobian w.r.t. $q$. More concretely,
 we want to know the angular
 velocity $w = J \dot q$ that is induced by a velocity (or infinitesimal
-variation) of $q$. To relate to Eq.~$\eqref{eqInterVel}$ we think of
+variation) of $q$. To relate to Eq.~\eqref{eqInterVel} we think of
 $q=q_A$ as the base orientation, and $\dot q$ as moving from $q_A$ to
-$q_B$. Eq.~$\eqref{eqInterVel}$ for $t=0$ becomes
+$q_B$. Eq.~\eqref{eqInterVel} for $t=0$ becomes
 
-$$\begin{align}
+\begin{align}
 \dot q &= \Frac12 (0, w) \circ q ~.
-\end{align}$$
+\end{align}
 
 This equation holds if $\dot q$ is tangential to $S^3$, and translates
 $\dot q$ to an angular vector $w\in\RRR^3$ \emph{relative} to the
@@ -225,10 +225,10 @@ \section{Angular Jacobian w.r.t. Quaternion Parameters}
 the base orientation $q_A$). Under these tangential assumptions, we
 can directly read out $w$ by multiplying $q^{-1}$ from the right,
 
-$$\begin{align}
+\begin{align}
 \dot q \circ q^{-1}
 &= \half (0,w) \comma w = 2~ [\dot q \circ q^{-1}]_{1:3} ~.
-\end{align}$$
+\end{align}
 
 However, in the case where $\dot q$ is non-tangential, i.e., $\dot q^\T q \not=0$,
 the change in length of the quaternion does not represent any angular
@@ -237,37 +237,37 @@ \section{Angular Jacobian w.r.t. Quaternion Parameters}
 However, when pluging in the tangentialized
 $\dot{\ul q}$ we get
 
-$$\begin{align}
+\begin{align}
 w
 &= 2~ [(\dot q - (q^\T \dot q) q)  \circ q^{-1}]_{1:3} \\
 &= 2~ [\dot q \circ q^{-1}]_{1:3} - 2~ (q^\T \dot q) [q \circ q^{-1}]_{1:3} ~,
-\end{align}$$
+\end{align}
 
 and the latter term is identically zero. So, our ``tangentialization term'' drops out anyway.
 
 To construct the Jacobian, let's assume $\dot q=e_i$ is one of the unit vectors
 $e_0,..,e_3\in\RRR^4$. Then we have
 
-$$\begin{align}
+\begin{align}
 w_i
 &= 2~ [e_i\circ q^{-1}]_{1:3} ~.
-\end{align}$$
+\end{align}
 
 Further, if $q$ itself was not normalized we want the angular Jacobian
 of $\ul q = \Frac1{|q|} q$ w.r.t. $q$. As the above relation between $q$
 and $w$ is linear, we have
 
-$$\begin{align}
+\begin{align}
 w_i
 &= \Frac2{|q|}~ [e_i \circ \ul q^{-1}]_{1:3} ~.
-\end{align}$$
+\end{align}
 
 Based on this we construct the \textbf{angular quaternion Jacobian}
 with 4 columns
 
-$$\begin{align}
+\begin{align}
 J_{\cdot i}(q) = \Frac2{|q|}~ [e_i \circ \ul q^{-1}]_{1:3} ~,
-\end{align}$$
+\end{align}
 
 so that we have $w = J(q)~ \dot q$ for any (also non-normalized, non-tangential) $q$
 and $\dot q$.
@@ -292,15 +292,15 @@ \section{Random rotations \& ``Gaussians''}
 have two options (akin to the interpolation options): The tangent
 space sampling:
 
-$$\begin{align}
+\begin{align}
 q' = \exp(w) \circ q\comma w = \NN(0,\s^2) \in \RRR^3 ~,
-\end{align}$$
+\end{align}
 
 or
 
-$$\begin{align}
+\begin{align}
 q' = \text{normalize}(q + \d/2)\comma \d = \NN(0,\s^2) \in \RRR^4 ~.
-\end{align}$$
+\end{align}
 
 In the limit of small $\s^2$ these become the same, as the Gaussian
 projected to the tangent space is the same as the Gaussian in the
@@ -316,31 +316,31 @@ \section{Similarly: Rotation vector and angular velocity}
 The application of a rotation described by $w\in\RRR^3$ on a vector
 $x\in\RRR^3$ is given as (Rodrigues' formula)
 
-$$\begin{align}
+\begin{align}
 w \cdot x
  &= \cos(\t)~ x
   + \sin(\t)~ (\ul w\times x)
   + (1-\cos(\t))~ \ul w(\ul w^\T x) ~.
-\end{align}$$
+\end{align}
 
 This directly also implies convertion to a rotation matrix, as the
 rows of the corresponding rotation matrix are simply $w \cdot e_i$ for
 the unit vectors $e_{1:3}\in\RRR^3$. To simplify the notation, we
 define the skew matrix of a vector $w\in\RRR^3$ as
 
-$$\begin{align}
+\begin{align}
 \skew(w) = \mat{ccc}{0 & -w_3 & w_2 \\ w_3 & 0 & -w_1 \\ -w_2 & w_1 & 0} ~.
-\end{align}$$
+\end{align}
 
 This allows to express the cross
 product as matrix multiplication, $w \times v = \skew(w)~ v$. The rotation matrix $R(w)$ 
 corresponding to a given rotation vector $w$ then is:
 
-$$\begin{align}\label{eqRodriguez}
+\begin{align}\label{eqRodriguez}
 R(w)
  &= \exp(\skew(w)) \\
  &= \cos(\t)~ \Id + \sin(\t)/\t~ \skew(w) + (1-\cos(\t))/\t^2~ w w^\T
-\end{align}$$
+\end{align}
 
 The $\exp$ function is called exponential map (generating a group
 element (=rotation matrix) via an element of the Lie algebra (=skew
@@ -356,9 +356,9 @@ \section{Similarly: Rotation vector and angular velocity}
 $t$ is described by a rotation matrix $R(t)$ and the body's angular
 velocity is $w$, then
 
-$$\begin{align}\label{eqDotR}
+\begin{align}\label{eqDotR}
 \dot R(t) = \skew(w)~ R(t)~.
-\end{align}$$
+\end{align}
 
 (That's intuitive to see for a rotation about the $x$-axis with
 velocity 1.) Some insights from this relation: Since $R(t)$ must
@@ -374,7 +374,7 @@ \section{Similarly: Rotation vector and angular velocity}
 $R(t)=\exp(t~ \skew(w))$, where here the exponential function notation
 is used to denote a more general so-called exponential map, as used in
 the context of Lie groups. It also follows that $R(w)$ from
-$\eqref{eqRodriguez}$ is the rotation matrix you get when you rotate for
+\eqref{eqRodriguez} is the rotation matrix you get when you rotate for
 1 second with angular velocity described by $w$.
 
 \end{document}
diff --git a/notes/robotKin.tex b/notes/robotKin.tex
index 5435fc4..5f39008 100644
--- a/notes/robotKin.tex
+++ b/notes/robotKin.tex
@@ -375,11 +375,11 @@ \subsection{Inverse Kinematics}
 y^*$. A proper approach is to formulate this as an NLP (non-linear
 mathematical program)
 
-$$\begin{align}
+\begin{align}
 &\min_{q\in\RRR^n} \norm{q-q_0}^2 \st \phi(q) = y^* \label{eqIK}\\
 \text{or}\quad&\min_{q\in\RRR^n} \norm{q-q_0}^2 + \m \norm{\phi(q) -
 y^*}^2 \quad\text{for large $\m$} 
-\end{align}$$
+\end{align}
 
 and use an efficient NLP solver (e.g.\ Augmented Lagrangian, or SQP,
 exploiting potential sparseness of $\Del q \phi$). However, typical
@@ -401,7 +401,7 @@ \subsection{Inverse Kinematics}
 a non-linear optimization problem with small-scaled Newton steps. This
 is not proper! Proper IK should really first compute the solution
 $q_T$ to
-$\eqref{eqIK}$, and then think about how the robot can actually move
+\eqref{eqIK}, and then think about how the robot can actually move
 to $q_T$ (e.g.\ using proper optimal control, or reactive spline interpolation, or
 a basic but nice motion profile).
 
diff --git a/notes/splines.tex b/notes/splines.tex
index 1bb5c62..2bd0542 100644
--- a/notes/splines.tex
+++ b/notes/splines.tex
@@ -51,20 +51,20 @@ \subsection{Single cubic spline for timing-optimal control to a target}
 
 Consider a cubic polynomial $x(t) = a t^3 + b t^2 + c t + d$. Given four boundary conditions $x(0)=x_0, \dot x(0) = v_0, x(\t) = x_\t, \dot x(\t) = v_\t$, the four coefficients are
 
-$$\begin{align}
+\begin{align}
 d &= x_0 ~, \\
 c &= \dot x_0 ~, \\
 b &= \frac{1}{\t^2}\[ 3(x_\t-x_0) - \t(\dot x_\t + 2 \dot x_0) \] ~, \\
 a &= \frac{1}{\t^3}\[ - 2(x_0-x_\t) + \t(\dot x_\t + \dot x_0) \] ~.
-\end{align}$$
+\end{align}
 
 This cubic spline is in fact the solution to an optimization problem, namely it is the path that minimizes accelerations between these boundary conditions and it can therefore be viewed as the solution to optimal control with acceleration costs:
 
-$$\begin{align}
+\begin{align}
 \min_x~ \int_0^\tau \ddot x(t)^2~ dt 
 \quad\st \mat{c}{x(0)\\\dot x(0)}=\mat{c}{x_0\\v_0},~
 \mat{c}{x(\tau)\\\dot x(\tau)}=\mat{c}{x_1\\v_1} ~.
-\end{align}$$
+\end{align}
 
 %% Check with Maxima:
 %% \begin{code}
@@ -77,7 +77,7 @@ \subsection{Single cubic spline for timing-optimal control to a target}
 %% \end{code}
 The minimal costs can analytically be given as
 
-$$\begin{align}
+\begin{align}
 \int_0^T \ddot x(t)^2 dt
 %% &= \int_0^T (6 a t + 2 b)^2 ~ dt \\
 %% &= \int_0^T (36 a^2 t^2 + 4 b^2 + 24 abt) ~ dt \\
@@ -88,11 +88,11 @@ \subsection{Single cubic spline for timing-optimal control to a target}
 &= \tilde D^\T \tilde D + \tilde V^\T \tilde V \comma
 \tilde D := \sqrt{12}~ \tau^{-\frac{3}{2}}~ D,~ \tilde V := \tau^{-\half}~ V ~,
 \label{eqLeapSOS}
-\end{align}$$
+\end{align}
 
 where we used some help of computer algebra to get this right.
 
-Eq. $\eqref{eqLeap}$ explicitly gives the optimal cost in terms of
+Eq. \eqref{eqLeap} explicitly gives the optimal cost in terms of
 boundary conditions $(x_0,v_0,x_1,v_1)$ and time $\tau$. This is a very powerful means to optimize boundary conditions and $\tau$. The following is a simple application that realizes reactive control.
 
 \subsubsection{Single-piece optimal timing control}
@@ -103,17 +103,17 @@ \subsubsection{Single-piece optimal timing control}
 
 Instead of imposing a desired PD behavior, we can impose a desired cubic spline behavior, which leads to succinct convergence in a finite expected time-to-target, as well as moderate gains when far. The approach is simply to choose an optimal $\tau$ (time-to-target) that minimizes
 
-$$\begin{align}
+\begin{align}
 \min_{\tau, x}~ \a \tau + \int_0^\tau \ddot x(t)^2~ dt
-\end{align}$$
+\end{align}
 
-under our boundary conditions, assuming a cubic spline $x(t), t\in[0,\tau]$. Using $\eqref{eqLeap}$, we know the optimal $x$ and optimal control costs for given $\tau$. When $\d = x_\Ref - x$ and $v$ are co-linear (i.e., the system moves towards the target), computer algebra can tell us the optimal $\tau$:
+under our boundary conditions, assuming a cubic spline $x(t), t\in[0,\tau]$. Using \eqref{eqLeap}, we know the optimal $x$ and optimal control costs for given $\tau$. When $\d = x_\Ref - x$ and $v$ are co-linear (i.e., the system moves towards the target), computer algebra can tell us the optimal $\tau$:
 
-$$\begin{align}\label{eqTimingControl}
+\begin{align}\label{eqTimingControl}
   \tau^* = \frac{1}{\a}\[ \sqrt{6 |\d| \a + v^2} - |v| \] ~.
-\end{align}$$
+\end{align}
 
-If the system has a lateral movement, the analytical solution seems overly complex, but a numerical solution to the least-squares form $\eqref{eqLeapSOS}$ very efficient. However, in practise, using $\eqref{eqTimingControl}$ with scalar  $v \gets (\d^\T v)/|\d|$ for easy timing control of convergence to a target is highly efficient and versatile.
+If the system has a lateral movement, the analytical solution seems overly complex, but a numerical solution to the least-squares form \eqref{eqLeapSOS} very efficient. However, in practise, using \eqref{eqTimingControl} with scalar  $v \gets (\d^\T v)/|\d|$ for easy timing control of convergence to a target is highly efficient and versatile.
 
 To make this a reactive control scheme, in each control cycle
 $$\tau^*$$ is reevaluated and the corresponding cubic spline reference send to
@@ -139,7 +139,7 @@ \subsubsection{Single-piece optimal timing control}
 %%   - \frac{36}{\tau^4}~ \d^2
 %%   + \frac{24}{\tau^3}~ \d^\T v
 %%   - \frac{4}{\tau^2}~ v^2  \quad \overset{!}= 0\\
-%% \end{align}$$
+%% \end{align}
 
 
 \subsection{Hermite Cubic Splines}
@@ -159,7 +159,7 @@ \subsection{Hermite Cubic Splines}
 However, optimizing both, timing and waypoint velocities under out
 optimal control objective is rather efficient and effective. Note that
 the optimal control cost over the full spline is just the sum of
-single piece costs $\eqref{eqLeapSOS}$. This represents costs as a
+single piece costs \eqref{eqLeapSOS}. This represents costs as a
 least-squares of differentiable features, where $D$ can be interpreted
 as distance to be covered by accelerations, and $V$ as necessary total
 acceleration, and the Jacobians of $\tilde D$ and $\tilde V$
@@ -175,30 +175,30 @@ \subsection{B-Splines}
 
 In B-splines, the path $x: [0,T] \to \RRR^n$ is expressed as a linear combination of control points $z_0,.., z_K \in \RRR^n$,
 
-$$\begin{align}\label{bspline}
+\begin{align}\label{bspline}
 x(t) = \sum_{i=0}^K B_{i,p}(t)~ z_i ~,
-\end{align}$$
+\end{align}
 
 where $B_{i,p}: \RRR \to \RRR$ maps the time $t$ to the weighting of the $i$th control point -- it blends in and out the $i$th control point. For any $t$ it  holds that $\sum_{i=0}^K B_{i,p}(t) = 1$, i.e., all the weights $B_{i,p}(t)$ sum to one (as with a probability distribution over $i$), and the path point $x(t)$ is therefore always in the convex hull of control points.
 
-Concerning terminology, actually the functions $B_{i,p}(t)$ are called \defn{B-splines}, not the resulting path $x(t)$. (But in everyday robotics language, one often calls the path a B-spline.) As the linear (scalar) product in $\eqref{bspline}$ is trivial, the maths (and complexity of code) is all about the B-splines $B_{i,p}(t)$, not the path $x(t)$.
+Concerning terminology, actually the functions $B_{i,p}(t)$ are called \defn{B-splines}, not the resulting path $x(t)$. (But in everyday robotics language, one often calls the path a B-spline.) As the linear (scalar) product in \eqref{bspline} is trivial, the maths (and complexity of code) is all about the B-splines $B_{i,p}(t)$, not the path $x(t)$.
 
 The B-spline functions $B_{i,p}(t)$ are fully specified by a non-decreasing series of time knots $t_0,..,t_m \in [0,T]$ and the integer degree $p\in\{0,1,..\}$. Namely, the recursive definition is
 
-$$\begin{align}
+\begin{align}
 B_{i,0}(t) &= [t_i \le t < t_{i\po}] \comma \text{for $0\le i \le m-1$} ~,\\
 B_{i,p}(t)
 &= \frac{t-t_i}{t_{i+p}-t_i}~ B_{i,p-1}(t)
  +  \frac{t_{i+p+1}-t}{t_{i+p+1}-t_{i+1}}~ B_{i+1,p-1}(t)  \comma \text{for $0\le i \le m-p-1$} ~.
-\end{align}$$
+\end{align}
 
 
 The zero-degree B-spline functions $B_{i,0}$ are indicators of $t_i \le t < t_{i\po}$, and $i$ ranges from $i=0,..,m-1$. The 1st-degree B-spline functions $B_{i,1}$ have support in $t_i \le t < t_{i+2}$ and $i$ only ranges in
 $i=0,..,m-2$ -- because one can show that the normalization $\sum_{i=0}^{m-2} B_{i,1}(t) = 1$ holds (and for $i>m-2$, the recursion would also not be clearly defined). In general, degree $p$ B-spline functions $B_{i,p}$ have support in $t_i \le t < t_{i+p+1}$ and $i$ ranges from $i=0,..,m+p-1$, which is why we need $K+1$ control points $z_{0:K}$ with
 
-$$\begin{align}
+\begin{align}
     K = m+p-1 ~,
-\end{align}$$
+\end{align}
 
 which ensures the normalization property $\sum_{i=0}^K B_{i,p}(t) = 1$ for every degree.
 
@@ -229,9 +229,9 @@ \subsubsection{B-spline Matrix for Time Discretized Paths}
 
 Splines describe a continuous path $x(t)$, but often we want to evaluate this path only at a finite number of time slices $t\in \{\hat t_1,..,\hat t_S\} \subset [0,T]$. E.g., this could be a grid of $S=100$ time slices over which we want to optimize using KOMO, and for which we have to compute collision features. Let $x \in \RRR^{S \times n}$ be the time discretized path, and $z \in\RRR^{K\po\times n}$ be the stack of control points. Then the B-spline representation becomes
 
-$$\begin{align}
+\begin{align}
 x = B_p z \comma \text{with } B_p\in\RRR^{S\times K\po},~ B_{p,si} = B_{i,p}(\hat t_s) ~,
-\end{align}$$
+\end{align}
 
 where $B_p$ is the B-spline matrix of degree $p$ for this particular time grid $\{\hat t_1,..,\hat t_S\}$.
 
@@ -245,9 +245,9 @@ \subsubsection{Ensuring B-splines pass through waypoints}
 
 We can distribute $S$ knots $t_{4:3+S}$ uniformly between start and end knots (or also at $\hat t_1,..,\hat t_S$), from which it follows we have $m = S+7$, and $K=m-p-1=S+3$, which are $K+1=S+4$ control points in total, of which $4$ are already fixed. So the $S$ middle control points are still free, and matrix inversion gives them from the desired waypoints,
 
-$$\begin{align}
+\begin{align}
   z_{2:S+1} = B^\1 x_{1:S} \comma \text{with } B \in \RRR^{S \times S},~ B_{si} =  B_{i+1,3}(\hat t_s),~ s,i=1,..,S  ~.
-  \end{align}$$
+\end{align}
 
 
 
@@ -257,24 +257,24 @@ \subsubsection{Ensuring boundary velocities}
 
 For degrees 2 and 3 this is simple to achieve: In both cases we usually have $z_0=z_1$ and $z_{K\1}=z_K$ to ensure zero start and end velocities. Modifying $z_1$ directly leads to the start velocity $\dot x(0) = \dot B_{0,p}(0) z_0 + \dot B_{1,p}(0) z_1$. But because of normalization we have $\dot B_{0,p}(0) = - \dot B_{1,p}(0)$, and therefore 
 
-$$\begin{align}
+\begin{align}
   \dot x(0) &= \dot B_{0,p}(0) (z_0 - z_1) \\
   z_1 &= z_0 - \frac{\dot x(0)}{\dot B_{0,p}(0)} ~.
-\end{align}$$
+\end{align}
 
 
 %% A NURBS (non-uniform rational B-spline) is a B-spline
 %% with weighted control points,
 %% \begin{align}
 %% f(t) = \sum_{i=0}^K \frac{w_i B_{i,p}(t)~ x_i}{\sum_{j=0}^K w_j B_{j,p}(t)} ~.
-%% \end{align}$$
+%% \end{align}
 
 
 \subsubsection{Gradients}
 
 The gradients of a B-spline represented path w.r.t.\ control points are trivial. But the gradients w.r.t.\ the knots are less trivial. Here the basic equations:
 
-$$\begin{align}
+\begin{align}
 B_{i,p}(t)
   &= \frac{t-t_i}{t_{i+p}-t_i} B_{i,p-1}(t)
  +  \frac{t_{i+p+1}-t}{t_{i+p+1}-t_{i+1}} B_{i+1,p-1}(t) \\
@@ -297,7 +297,7 @@ \subsubsection{Gradients}
 \del_{t_{i+p+1}} B_{i,p}
   &= \[\frac{1}{t_{i+p+1}-t} - \frac{1}{t_{i+p+1}-t_{i+1}}\]~ w~ B_{i+1,p-1}
       + v~ \del_{t_{i+p+1}}~ B_{i,p-1} + w~ \del_{t_{i+p+1}}~ B_{i+1,p-1}
-\end{align}$$
+\end{align}
 
 
 \end{document}