Skip to content

Commit

Permalink
svd added
Browse files Browse the repository at this point in the history
  • Loading branch information
MarcToussaint committed Aug 23, 2024
1 parent 2f0d593 commit 25ecc53
Show file tree
Hide file tree
Showing 4 changed files with 179 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/notes/latex-macros.inc
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@
\newcommand{\pt}{{\myplus2}}
%\renewcommand{\-}{\myminus}
%\newcommand{\+}{\myplus}
\newcommand{\T}{{\!\top\!}}
\newcommand{\T}{{\mkern-1pt \top \mkern-1pt}}
%\newcommand{\T}{{\!\textsf{\scriptsize T}\!}}
\newcommand{\xT}{{\underline x}}
\newcommand{\uT}{{\underline u}}
Expand Down
76 changes: 76 additions & 0 deletions docs/notes/svd.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
One can view a matrix $A$ as a collection of rows,
$A= \left(\begin{array}{c}v_1^{\mkern-1pt \top \mkern-1pt}\\\ v_2^{\mkern-1pt \top \mkern-1pt}\\\ v_2^{\mkern-1pt \top \mkern-1pt}\end{array}\right)$.
Applying $A$ on $x$ is then scalar-producting all rows with $x$ and
outputs $Ax= \left(\begin{array}{c}v_1^{\mkern-1pt \top \mkern-1pt}
x \\\ v_2^{\mkern-1pt \top \mkern-1pt}x \\\ v_2^{\mkern-1pt \top \mkern-1pt}x\end{array}\right)$.
The rows span an input space $I=\mathop{\mathrm{span}}
v_{1:3}$. All $x$ that are orthogonal to $I$ will be mapped to zero. The
rows only pick up components of $x$ that lie within $I$.

Or one can view a matrix $A$ as a collection of columns,
$A= \left(\begin{array}{cccc}u_1 & u_2 & u_3 & u_4\end{array}\right)$.
Applying $A$ on $x$ then gives the linear combination
$Ax = x_1 u_1 + .. + x_4 u_4$ of these columns, with $x_i$ being the
linear coefficients. All outputs $y = A x$ will lie in the output space
$O=\mathop{\mathrm{span}}u_{1:4}$.

This view of matrices as input space spanning rows, or output space
spanning columns, is useful and clarifies that matricies transport from
some input space to some output space. But given a matrix $A$ in raw
form we don’t really have an explicit understanding of that
transport: Regarding the input space, some rows might be linearly
dependent, so that the input dimension could be less than $n$. And the
rows may not be orthonormal, so we do not have an explicit orthonormal
basis describing the input space. The same two points hold for the
output space (columns being linearly dependent and not orthonormal).

The SVD rewrites a matrix in a form where we really have an orthonormal
basis for the input and output spaces, and a clear understanding which
input directions are mapped to which output directions. Here the
theorem:

For any matrix $A\in{\mathbb{R}}^{m\times n}$ there exists a $k\le m,n$,
orthonormal vectors $v_1,..,v_k \in {\mathbb{R}}^n$, orthonormal vectors
$u_1,..,u_k\in{\mathbb{R}}^m$, and scalar numbers $\sigma_k>0$, such
that

$$\begin{aligned}
A = \sum_{i=1}^k u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}= U S V^{\mkern-1pt \top \mkern-1pt}
~,\quad\text{where}\quad S = {\rm diag}(\sigma_{1:k}),~ U=u_{1:k}\in{\mathbb{R}}^{m\times k},~ V=
v_{1:k} \in {\mathbb{R}}^{n\times k} ~.\end{aligned}$$

In this form, we see that $V^{\mkern-1pt \top \mkern-1pt}$ spans the
input space with orthonormal rows $v_i^{\mkern-1pt \top \mkern-1pt}$,
and $U$ spans the output space with orthonormal columns $u_i$. Further,
we understand what’s happening "in between": Each component
$u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}$ first projects $x$ on
the $i$th input direction $v_i$, then scales this with the factor
$\sigma_i$, then out-projects it to the output direction $u_i$. This is
done "independently" for all $i=1,..,k$, as all $v_i$ and $u_i$ are
orthogonal. In short, what the matrix does it: it transports each input
direction $v_i$ to the output direction $v_i$ and scales by $\sigma_i$
in between. The number $k$ tells how many dimensions are actually
transported (could be less than $m$ and $n$).

$k$ is called the rank of the matrix (note that we required
$\sigma_i>0$) and $\sigma_i$ are called the singular values.

The matrices $U$ and $V$ are orthonormal and in some explanations
characterized as rotations, and the equation
$A=U S V^{\mkern-1pt \top \mkern-1pt}$ described as
rotation-scaling-rotation. That’s ok, but this story does not work
well if $m\not= n$ (we have different input and output spaces), or
$k<m,n$ (we don&rsquo;t have full rank). I think the above story is
better.

Matrices of the form $x y^{\mkern-1pt \top \mkern-1pt}$ (which is also
called outer product of $x$ and $y$) are of rank 1 (the singular value
would be $\sigma_1 = |x||y|$, and $u=x/|x|, v=y/|y|$). One can think of
rank 1 matrices as minimalistic matrices: they pick up a single input
direction, scale, and out-project to a single output direction. The sum
notation
$A = \sum_{i=1}^k \sigma_i u_i v_i^{\mkern-1pt \top \mkern-1pt}$
describes $A$ as a sum of rank 1 matrices, i.e., every matrix $A$ can be
thought of as a composition of rank 1 matrices. This clarifies in what
sense rank 1 matrices are minimalistic building blocks of higher rank
matrices.
15 changes: 15 additions & 0 deletions docs/notes/svd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
layout: home
title: "Singular Value Decomposition"
date: 2024-08-23
tags: note
---

*[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning &
Intelligent Systems Lab, TU Berlin, {{ page.date | date: '%B %d, %Y' }}*

[[pdf version](../pdfs/svd.pdf)]

{% include_relative svd.inc %}

{% include note-footer.md %}
87 changes: 87 additions & 0 deletions notes/svd.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
\input{../latex/shared}
\note

\title{Lecture Note:\\ Singular Value Decomposition}
\author{Marc Toussaint\\\small Learning \& Intelligent Systems Lab, TU Berlin}

\makeatletter
\renewcommand{\@seccntformat}[1]{}
\makeatother

\notetitle

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

One can view a matrix $A$ as a collection of rows,
$A=\mat{c}{v_1^\T \\\ v_2^\T \\\ v_2^\T}$. Applying $A$ on $x$ is then
scalar-producting all rows with $x$ and outputs $Ax=\mat{c}{v_1^\T
x \\\ v_2^\T x \\\ v_2^\T x}$. The rows span an input space $I=\Span
v_{1:3}$. All $x$ that are orthogonal to $I$ will be mapped to
zero. The rows only pick up components of $x$ that lie within $I$.

Or one can view a matrix $A$ as a collection of columns,
$A=\mat{cccc}{u_1 & u_2 & u_3 & u_4}$. Applying $A$ on $x$ then gives
the linear combination $Ax = x_1 u_1 + .. + x_4 u_4$ of these columns,
with $x_i$ being the linear coefficients. All outputs $y = A x$ will
lie in the output space $O=\Span u_{1:4}$.

This view of matrices as input space spanning rows, or output space
spanning columns, is useful and clarifies that matricies transport
from some input space to some output space. But given a matrix $A$ in raw
form we
don't really have an explicit understanding of that transport:
Regarding the input space, some
rows might be linearly dependent, so that the input dimension could be
less than $n$. And the rows may not be orthonormal, so we do not have
an explicit orthonormal basis describing the input space. The same two points
hold for the output space (columns being linearly dependent and not
orthonormal).

The SVD rewrites a matrix in a form where we really have an orthonormal basis
for the input and output spaces, and a clear understanding which input
directions are mapped to which output directions. Here the theorem:

For any matrix $A\in\RRR^{m\times n}$ there exists a $k\le m,n$,
orthonormal vectors $v_1,..,v_k \in \RRR^n$, orthonormal vectors
$u_1,..,u_k\in\RRR^m$, and scalar numbers $\s_k>0$, such that

\begin{align}
A = \sum_{i=1}^k u_i \s_i v_i^\T = U S V^\T
~,\quad\text{where}\quad S = \diag(\s_{1:k}),~ U=u_{1:k}\in\RRR^{m\times k},~ V=
v_{1:k} \in \RRR^{n\times k} ~.
\end{align}

In this form, we see
that $V^\T$ spans the input space with orthonormal rows $v_i^\T$, and
$U$ spans the output space with orthonormal columns $u_i$. Further, we
understand what's happening ``in between'':
Each component $u_i \s_i v_i^\T$ first projects $x$ on the $i$th input
direction $v_i$, then scales this with the factor $\s_i$, then
out-projects it to the output direction $u_i$. This is done
``independently'' for all $i=1,..,k$, as all $v_i$ and $u_i$ are
orthogonal. In short, what the matrix does it: it transports each
input direction $v_i$ to the output direction $v_i$ and scales by
$\s_i$ in between. The number $k$ tells how many dimensions are
actually transported (could be less than $m$ and $n$).

$k$ is called the rank of the matrix (note that we required $\s_i>0$)
and $\s_i$ are called the singular values.

The matrices $U$ and $V$ are orthonormal
and in some explanations characterized as rotations, and the equation
$A=U S V^\T$ described as rotation-scaling-rotation. That's ok, but
this story does not work well if $m\not= n$ (we have different input
and output spaces), or $k<m,n$ (we don't have full rank). I think the
above story is better.

Matrices of the form $x y^\T$ (which is also called outer product of
$x$ and $y$) are of rank 1 (the singular
value would be $\s_1 = |x||y|$, and $u=x/|x|, v=y/|y|$). One can think of rank 1 matrices as
minimalistic matrices: they pick up a single input direction, scale,
and out-project to a single output direction. The sum notation $A = \sum_{i=1}^k \s_i u_i v_i^\T$ describes $A$ as
a sum of rank 1 matrices, i.e., every matrix $A$ can be thought of as a
composition of rank 1 matrices. This clarifies in what sense rank
1 matrices are minimalistic building blocks of higher
rank matrices.

\end{document}

0 comments on commit 25ecc53

Please sign in to comment.