svd added

MarcToussaint · Aug 23, 2024 · 25ecc53 · 25ecc53
1 parent 2f0d593
commit 25ecc53
Show file tree

Hide file tree

Showing 4 changed files with 179 additions and 1 deletion.
diff --git a/docs/notes/latex-macros.inc b/docs/notes/latex-macros.inc
@@ -269,7 +269,7 @@
 \newcommand{\pt}{{\myplus2}}
 %\renewcommand{\-}{\myminus}
 %\newcommand{\+}{\myplus}
-\newcommand{\T}{{\!\top\!}}
+\newcommand{\T}{{\mkern-1pt \top \mkern-1pt}}
 %\newcommand{\T}{{\!\textsf{\scriptsize T}\!}}
 \newcommand{\xT}{{\underline x}}
 \newcommand{\uT}{{\underline u}}

diff --git a/docs/notes/svd.inc b/docs/notes/svd.inc
@@ -0,0 +1,76 @@
+One can view a matrix $A$ as a collection of rows,
+$A= \left(\begin{array}{c}v_1^{\mkern-1pt \top \mkern-1pt}\\\ v_2^{\mkern-1pt \top \mkern-1pt}\\\ v_2^{\mkern-1pt \top \mkern-1pt}\end{array}\right)$.
+Applying $A$ on $x$ is then scalar-producting all rows with $x$ and
+outputs $Ax= \left(\begin{array}{c}v_1^{\mkern-1pt \top \mkern-1pt}
+x \\\ v_2^{\mkern-1pt \top \mkern-1pt}x \\\ v_2^{\mkern-1pt \top \mkern-1pt}x\end{array}\right)$.
+The rows span an input space $I=\mathop{\mathrm{span}}
+v_{1:3}$. All $x$ that are orthogonal to $I$ will be mapped to zero. The
+rows only pick up components of $x$ that lie within $I$.
+
+Or one can view a matrix $A$ as a collection of columns,
+$A= \left(\begin{array}{cccc}u_1 & u_2 & u_3 & u_4\end{array}\right)$.
+Applying $A$ on $x$ then gives the linear combination
+$Ax = x_1 u_1 + .. + x_4 u_4$ of these columns, with $x_i$ being the
+linear coefficients. All outputs $y = A x$ will lie in the output space
+$O=\mathop{\mathrm{span}}u_{1:4}$.
+
+This view of matrices as input space spanning rows, or output space
+spanning columns, is useful and clarifies that matricies transport from
+some input space to some output space. But given a matrix $A$ in raw
+form we don&rsquo;t really have an explicit understanding of that
+transport: Regarding the input space, some rows might be linearly
+dependent, so that the input dimension could be less than $n$. And the
+rows may not be orthonormal, so we do not have an explicit orthonormal
+basis describing the input space. The same two points hold for the
+output space (columns being linearly dependent and not orthonormal).
+
+The SVD rewrites a matrix in a form where we really have an orthonormal
+basis for the input and output spaces, and a clear understanding which
+input directions are mapped to which output directions. Here the
+theorem:
+
+For any matrix $A\in{\mathbb{R}}^{m\times n}$ there exists a $k\le m,n$,
+orthonormal vectors $v_1,..,v_k \in {\mathbb{R}}^n$, orthonormal vectors
+$u_1,..,u_k\in{\mathbb{R}}^m$, and scalar numbers $\sigma_k>0$, such
+that
+
+$$\begin{aligned}
+A = \sum_{i=1}^k  u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}= U S V^{\mkern-1pt \top \mkern-1pt}
+~,\quad\text{where}\quad S = {\rm diag}(\sigma_{1:k}),~ U=u_{1:k}\in{\mathbb{R}}^{m\times k},~ V=
+v_{1:k} \in {\mathbb{R}}^{n\times k} ~.\end{aligned}$$
+
+In this form, we see that $V^{\mkern-1pt \top \mkern-1pt}$ spans the
+input space with orthonormal rows $v_i^{\mkern-1pt \top \mkern-1pt}$,
+and $U$ spans the output space with orthonormal columns $u_i$. Further,
+we understand what&rsquo;s happening "in between": Each component
+$u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}$ first projects $x$ on
+the $i$th input direction $v_i$, then scales this with the factor
+$\sigma_i$, then out-projects it to the output direction $u_i$. This is
+done "independently" for all $i=1,..,k$, as all $v_i$ and $u_i$ are
+orthogonal. In short, what the matrix does it: it transports each input
+direction $v_i$ to the output direction $v_i$ and scales by $\sigma_i$
+in between. The number $k$ tells how many dimensions are actually
+transported (could be less than $m$ and $n$).
+
+$k$ is called the rank of the matrix (note that we required
+$\sigma_i>0$) and $\sigma_i$ are called the singular values.
+
+The matrices $U$ and $V$ are orthonormal and in some explanations
+characterized as rotations, and the equation
+$A=U S V^{\mkern-1pt \top \mkern-1pt}$ described as
+rotation-scaling-rotation. That&rsquo;s ok, but this story does not work
+well if $m\not= n$ (we have different input and output spaces), or
+$k<m,n$ (we don&rsquo;t have full rank). I think the above story is
+better.
+
+Matrices of the form $x y^{\mkern-1pt \top \mkern-1pt}$ (which is also
+called outer product of $x$ and $y$) are of rank 1 (the singular value
+would be $\sigma_1 = |x||y|$, and $u=x/|x|, v=y/|y|$). One can think of
+rank 1 matrices as minimalistic matrices: they pick up a single input
+direction, scale, and out-project to a single output direction. The sum
+notation
+$A = \sum_{i=1}^k  \sigma_i u_i v_i^{\mkern-1pt \top \mkern-1pt}$
+describes $A$ as a sum of rank 1 matrices, i.e., every matrix $A$ can be
+thought of as a composition of rank 1 matrices. This clarifies in what
+sense rank 1 matrices are minimalistic building blocks of higher rank
+matrices.
diff --git a/docs/notes/svd.md b/docs/notes/svd.md
@@ -0,0 +1,15 @@
+---
+layout: home
+title:  "Singular Value Decomposition"
+date: 2024-08-23
+tags: note
+---
+
+*[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
+Intelligent Systems Lab, TU Berlin, {{ page.date  | date: '%B %d, %Y' }}*
+
+[[pdf version](../pdfs/svd.pdf)]
+
+{% include_relative svd.inc %}
+
+{% include note-footer.md %}
diff --git a/notes/svd.tex b/notes/svd.tex
@@ -0,0 +1,87 @@
+\input{../latex/shared}
+\note
+
+\title{Lecture Note:\\ Singular Value Decomposition}
+\author{Marc Toussaint\\\small Learning \& Intelligent Systems Lab, TU Berlin}
+
+\makeatletter
+\renewcommand{\@seccntformat}[1]{}
+\makeatother
+
+\notetitle
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+One can view a matrix $A$ as a collection of rows,
+$A=\mat{c}{v_1^\T \\\ v_2^\T \\\ v_2^\T}$. Applying $A$ on $x$ is then
+scalar-producting all rows with $x$ and outputs $Ax=\mat{c}{v_1^\T
+x \\\ v_2^\T x \\\ v_2^\T x}$. The rows span an input space $I=\Span
+v_{1:3}$. All $x$ that are orthogonal to $I$ will be mapped to
+zero. The rows only pick up components of $x$ that lie within $I$.
+
+Or one can view a matrix $A$ as a collection of columns,
+$A=\mat{cccc}{u_1 & u_2 & u_3 & u_4}$. Applying $A$ on $x$ then gives
+the linear combination $Ax = x_1 u_1 + .. + x_4 u_4$ of these columns,
+with $x_i$ being the linear coefficients. All outputs $y = A x$ will
+lie in the output space $O=\Span u_{1:4}$.
+
+This view of matrices as input space spanning rows, or output space
+spanning columns, is useful and clarifies that matricies transport
+from some input space to some output space. But given a matrix $A$ in raw
+form we
+don't really have an explicit understanding of that transport:
+Regarding the input space, some
+rows might be linearly dependent, so that the input dimension could be
+less than $n$. And the rows may not be orthonormal, so we do not have
+an explicit orthonormal basis describing the input space. The same two points
+hold for the output space (columns being linearly dependent and not
+orthonormal).
+
+The SVD rewrites a matrix in a form where we really have an orthonormal basis
+for the input and output spaces, and a clear understanding which input
+directions are mapped to which output directions. Here the theorem:
+
+For any matrix $A\in\RRR^{m\times n}$ there exists a $k\le m,n$, 
+orthonormal vectors $v_1,..,v_k \in \RRR^n$, orthonormal vectors
+$u_1,..,u_k\in\RRR^m$, and scalar numbers $\s_k>0$, such that
+
+\begin{align}
+A = \sum_{i=1}^k  u_i \s_i v_i^\T = U S V^\T
+~,\quad\text{where}\quad S = \diag(\s_{1:k}),~ U=u_{1:k}\in\RRR^{m\times k},~ V=
+v_{1:k} \in \RRR^{n\times k} ~.
+\end{align}
+
+In this form, we see
+that $V^\T$ spans the input space with orthonormal rows $v_i^\T$, and
+$U$ spans the output space with orthonormal columns $u_i$. Further, we
+understand what's happening ``in between'': 
+Each component $u_i \s_i v_i^\T$ first projects $x$ on the $i$th input
+direction $v_i$, then scales this with the factor $\s_i$, then
+out-projects it to the output direction $u_i$. This is done
+``independently'' for all $i=1,..,k$, as all $v_i$ and $u_i$ are
+orthogonal. In short, what the matrix does it: it transports each
+input direction $v_i$ to the output direction $v_i$ and scales by
+$\s_i$ in between. The number $k$ tells how many dimensions are
+actually transported (could be less than $m$ and $n$).
+
+$k$ is called the rank of the matrix (note that we required $\s_i>0$)
+and $\s_i$ are called the singular values.
+
+The matrices $U$ and $V$ are orthonormal
+and in some explanations characterized as rotations, and the equation
+$A=U S V^\T$ described as rotation-scaling-rotation. That's ok, but
+this story does not work well if $m\not= n$ (we have different input
+and output spaces), or $k<m,n$ (we don't have full rank). I think the
+above story is better.
+
+Matrices of the form $x y^\T$ (which is also called outer product of
+$x$ and $y$) are of rank 1 (the singular
+value would be $\s_1 = |x||y|$, and $u=x/|x|, v=y/|y|$). One can think of rank 1 matrices as
+minimalistic matrices: they pick up a single input direction, scale,
+and out-project to a single output direction. The sum notation $A = \sum_{i=1}^k  \s_i u_i v_i^\T$ describes $A$ as
+a sum of rank 1 matrices, i.e., every matrix $A$ can be thought of as a
+composition of rank 1 matrices. This clarifies in what sense rank
+1 matrices are minimalistic building blocks of higher
+rank matrices.
+
+\end{document}