-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2f0d593
commit 25ecc53
Showing
4 changed files
with
179 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
One can view a matrix $A$ as a collection of rows, | ||
$A= \left(\begin{array}{c}v_1^{\mkern-1pt \top \mkern-1pt}\\\ v_2^{\mkern-1pt \top \mkern-1pt}\\\ v_2^{\mkern-1pt \top \mkern-1pt}\end{array}\right)$. | ||
Applying $A$ on $x$ is then scalar-producting all rows with $x$ and | ||
outputs $Ax= \left(\begin{array}{c}v_1^{\mkern-1pt \top \mkern-1pt} | ||
x \\\ v_2^{\mkern-1pt \top \mkern-1pt}x \\\ v_2^{\mkern-1pt \top \mkern-1pt}x\end{array}\right)$. | ||
The rows span an input space $I=\mathop{\mathrm{span}} | ||
v_{1:3}$. All $x$ that are orthogonal to $I$ will be mapped to zero. The | ||
rows only pick up components of $x$ that lie within $I$. | ||
|
||
Or one can view a matrix $A$ as a collection of columns, | ||
$A= \left(\begin{array}{cccc}u_1 & u_2 & u_3 & u_4\end{array}\right)$. | ||
Applying $A$ on $x$ then gives the linear combination | ||
$Ax = x_1 u_1 + .. + x_4 u_4$ of these columns, with $x_i$ being the | ||
linear coefficients. All outputs $y = A x$ will lie in the output space | ||
$O=\mathop{\mathrm{span}}u_{1:4}$. | ||
|
||
This view of matrices as input space spanning rows, or output space | ||
spanning columns, is useful and clarifies that matricies transport from | ||
some input space to some output space. But given a matrix $A$ in raw | ||
form we don’t really have an explicit understanding of that | ||
transport: Regarding the input space, some rows might be linearly | ||
dependent, so that the input dimension could be less than $n$. And the | ||
rows may not be orthonormal, so we do not have an explicit orthonormal | ||
basis describing the input space. The same two points hold for the | ||
output space (columns being linearly dependent and not orthonormal). | ||
|
||
The SVD rewrites a matrix in a form where we really have an orthonormal | ||
basis for the input and output spaces, and a clear understanding which | ||
input directions are mapped to which output directions. Here the | ||
theorem: | ||
|
||
For any matrix $A\in{\mathbb{R}}^{m\times n}$ there exists a $k\le m,n$, | ||
orthonormal vectors $v_1,..,v_k \in {\mathbb{R}}^n$, orthonormal vectors | ||
$u_1,..,u_k\in{\mathbb{R}}^m$, and scalar numbers $\sigma_k>0$, such | ||
that | ||
|
||
$$\begin{aligned} | ||
A = \sum_{i=1}^k u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}= U S V^{\mkern-1pt \top \mkern-1pt} | ||
~,\quad\text{where}\quad S = {\rm diag}(\sigma_{1:k}),~ U=u_{1:k}\in{\mathbb{R}}^{m\times k},~ V= | ||
v_{1:k} \in {\mathbb{R}}^{n\times k} ~.\end{aligned}$$ | ||
|
||
In this form, we see that $V^{\mkern-1pt \top \mkern-1pt}$ spans the | ||
input space with orthonormal rows $v_i^{\mkern-1pt \top \mkern-1pt}$, | ||
and $U$ spans the output space with orthonormal columns $u_i$. Further, | ||
we understand what’s happening "in between": Each component | ||
$u_i \sigma_i v_i^{\mkern-1pt \top \mkern-1pt}$ first projects $x$ on | ||
the $i$th input direction $v_i$, then scales this with the factor | ||
$\sigma_i$, then out-projects it to the output direction $u_i$. This is | ||
done "independently" for all $i=1,..,k$, as all $v_i$ and $u_i$ are | ||
orthogonal. In short, what the matrix does it: it transports each input | ||
direction $v_i$ to the output direction $v_i$ and scales by $\sigma_i$ | ||
in between. The number $k$ tells how many dimensions are actually | ||
transported (could be less than $m$ and $n$). | ||
|
||
$k$ is called the rank of the matrix (note that we required | ||
$\sigma_i>0$) and $\sigma_i$ are called the singular values. | ||
|
||
The matrices $U$ and $V$ are orthonormal and in some explanations | ||
characterized as rotations, and the equation | ||
$A=U S V^{\mkern-1pt \top \mkern-1pt}$ described as | ||
rotation-scaling-rotation. That’s ok, but this story does not work | ||
well if $m\not= n$ (we have different input and output spaces), or | ||
$k<m,n$ (we don’t have full rank). I think the above story is | ||
better. | ||
|
||
Matrices of the form $x y^{\mkern-1pt \top \mkern-1pt}$ (which is also | ||
called outer product of $x$ and $y$) are of rank 1 (the singular value | ||
would be $\sigma_1 = |x||y|$, and $u=x/|x|, v=y/|y|$). One can think of | ||
rank 1 matrices as minimalistic matrices: they pick up a single input | ||
direction, scale, and out-project to a single output direction. The sum | ||
notation | ||
$A = \sum_{i=1}^k \sigma_i u_i v_i^{\mkern-1pt \top \mkern-1pt}$ | ||
describes $A$ as a sum of rank 1 matrices, i.e., every matrix $A$ can be | ||
thought of as a composition of rank 1 matrices. This clarifies in what | ||
sense rank 1 matrices are minimalistic building blocks of higher rank | ||
matrices. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
--- | ||
layout: home | ||
title: "Singular Value Decomposition" | ||
date: 2024-08-23 | ||
tags: note | ||
--- | ||
|
||
*[Marc Toussaint](https://www.user.tu-berlin.de/mtoussai/), Learning & | ||
Intelligent Systems Lab, TU Berlin, {{ page.date | date: '%B %d, %Y' }}* | ||
|
||
[[pdf version](../pdfs/svd.pdf)] | ||
|
||
{% include_relative svd.inc %} | ||
|
||
{% include note-footer.md %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
\input{../latex/shared} | ||
\note | ||
|
||
\title{Lecture Note:\\ Singular Value Decomposition} | ||
\author{Marc Toussaint\\\small Learning \& Intelligent Systems Lab, TU Berlin} | ||
|
||
\makeatletter | ||
\renewcommand{\@seccntformat}[1]{} | ||
\makeatother | ||
|
||
\notetitle | ||
|
||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% | ||
|
||
One can view a matrix $A$ as a collection of rows, | ||
$A=\mat{c}{v_1^\T \\\ v_2^\T \\\ v_2^\T}$. Applying $A$ on $x$ is then | ||
scalar-producting all rows with $x$ and outputs $Ax=\mat{c}{v_1^\T | ||
x \\\ v_2^\T x \\\ v_2^\T x}$. The rows span an input space $I=\Span | ||
v_{1:3}$. All $x$ that are orthogonal to $I$ will be mapped to | ||
zero. The rows only pick up components of $x$ that lie within $I$. | ||
|
||
Or one can view a matrix $A$ as a collection of columns, | ||
$A=\mat{cccc}{u_1 & u_2 & u_3 & u_4}$. Applying $A$ on $x$ then gives | ||
the linear combination $Ax = x_1 u_1 + .. + x_4 u_4$ of these columns, | ||
with $x_i$ being the linear coefficients. All outputs $y = A x$ will | ||
lie in the output space $O=\Span u_{1:4}$. | ||
|
||
This view of matrices as input space spanning rows, or output space | ||
spanning columns, is useful and clarifies that matricies transport | ||
from some input space to some output space. But given a matrix $A$ in raw | ||
form we | ||
don't really have an explicit understanding of that transport: | ||
Regarding the input space, some | ||
rows might be linearly dependent, so that the input dimension could be | ||
less than $n$. And the rows may not be orthonormal, so we do not have | ||
an explicit orthonormal basis describing the input space. The same two points | ||
hold for the output space (columns being linearly dependent and not | ||
orthonormal). | ||
|
||
The SVD rewrites a matrix in a form where we really have an orthonormal basis | ||
for the input and output spaces, and a clear understanding which input | ||
directions are mapped to which output directions. Here the theorem: | ||
|
||
For any matrix $A\in\RRR^{m\times n}$ there exists a $k\le m,n$, | ||
orthonormal vectors $v_1,..,v_k \in \RRR^n$, orthonormal vectors | ||
$u_1,..,u_k\in\RRR^m$, and scalar numbers $\s_k>0$, such that | ||
|
||
\begin{align} | ||
A = \sum_{i=1}^k u_i \s_i v_i^\T = U S V^\T | ||
~,\quad\text{where}\quad S = \diag(\s_{1:k}),~ U=u_{1:k}\in\RRR^{m\times k},~ V= | ||
v_{1:k} \in \RRR^{n\times k} ~. | ||
\end{align} | ||
|
||
In this form, we see | ||
that $V^\T$ spans the input space with orthonormal rows $v_i^\T$, and | ||
$U$ spans the output space with orthonormal columns $u_i$. Further, we | ||
understand what's happening ``in between'': | ||
Each component $u_i \s_i v_i^\T$ first projects $x$ on the $i$th input | ||
direction $v_i$, then scales this with the factor $\s_i$, then | ||
out-projects it to the output direction $u_i$. This is done | ||
``independently'' for all $i=1,..,k$, as all $v_i$ and $u_i$ are | ||
orthogonal. In short, what the matrix does it: it transports each | ||
input direction $v_i$ to the output direction $v_i$ and scales by | ||
$\s_i$ in between. The number $k$ tells how many dimensions are | ||
actually transported (could be less than $m$ and $n$). | ||
|
||
$k$ is called the rank of the matrix (note that we required $\s_i>0$) | ||
and $\s_i$ are called the singular values. | ||
|
||
The matrices $U$ and $V$ are orthonormal | ||
and in some explanations characterized as rotations, and the equation | ||
$A=U S V^\T$ described as rotation-scaling-rotation. That's ok, but | ||
this story does not work well if $m\not= n$ (we have different input | ||
and output spaces), or $k<m,n$ (we don't have full rank). I think the | ||
above story is better. | ||
|
||
Matrices of the form $x y^\T$ (which is also called outer product of | ||
$x$ and $y$) are of rank 1 (the singular | ||
value would be $\s_1 = |x||y|$, and $u=x/|x|, v=y/|y|$). One can think of rank 1 matrices as | ||
minimalistic matrices: they pick up a single input direction, scale, | ||
and out-project to a single output direction. The sum notation $A = \sum_{i=1}^k \s_i u_i v_i^\T$ describes $A$ as | ||
a sum of rank 1 matrices, i.e., every matrix $A$ can be thought of as a | ||
composition of rank 1 matrices. This clarifies in what sense rank | ||
1 matrices are minimalistic building blocks of higher | ||
rank matrices. | ||
|
||
\end{document} |