Add missing examples in docs, fix typos

cabralpinto · Aug 27, 2023 · 4da5ae1 · 4da5ae1
1 parent 62c8461
commit 4da5ae1
Show file tree

Hide file tree

Showing 14 changed files with 35 additions and 15 deletions.
diff --git a/docs/public/images/modules/noise-schedule/constant.png b/docs/public/images/modules/noise-schedule/constant.png
diff --git a/docs/public/images/modules/noise-schedule/cosine.png b/docs/public/images/modules/noise-schedule/cosine.png
diff --git a/docs/public/images/modules/noise-schedule/linear.png b/docs/public/images/modules/noise-schedule/linear.png
diff --git a/docs/public/images/modules/noise-schedule/sqrt.png b/docs/public/images/modules/noise-schedule/sqrt.png
diff --git a/docs/public/images/modules/noise-type/absorbing.png b/docs/public/images/modules/noise-type/absorbing.png
diff --git a/docs/public/images/modules/noise-type/gaussian.png b/docs/public/images/modules/noise-type/gaussian.png
diff --git a/docs/public/images/modules/noise-type/uniform.png b/docs/public/images/modules/noise-type/uniform.png
diff --git a/docs/src/pages/guides/custom-modules.mdx b/docs/src/pages/guides/custom-modules.mdx
@@ -14,11 +14,11 @@ When tinkering with Diffusion Models, the time will come when you need to ventur
 >
 > As with all library code, this tutorial adheres to strict type checking standards. Although we recommend typing your code, you may elect to avoid writing type annotations. By skipping this step, however, you will not receive a warning if you try to mix incompatible modules, or other useful intellisense.
 
-## Data transformation
+## Data transform
 
-In many Diffusion Model applications, the diffusion process takes place in the dataset space. If this is your case, the prebuilt `Identity` data transformation module will serve your purposes, leaving your data untouched before applying noise during training. However, a growing number of algorithms, like [Stable Diffusion](https://arxiv.org/abs/2112.10752) and [Diffusion-LM](https://arxiv.org/abs/2205.14217), project data onto a latent space before applying diffusion. 
+In many Diffusion Model applications, the diffusion process takes place in the dataset space. If this is your case, the prebuilt `Identity` data transform module will serve your purposes, leaving your data untouched before applying noise during training. However, a growing number of algorithms, like [Stable Diffusion](https://arxiv.org/abs/2112.10752) and [Diffusion-LM](https://arxiv.org/abs/2205.14217), project data onto a latent space before applying diffusion. 
 
-In the case of Diffusion-LM, the dataset consists of sequences of word IDs, but the diffusion process happens in the word embedding space. This means you need a way of converting sequences of word IDs into sequences of embeddings, and train the embeddings along with the Diffusion Model. In Modular Diffusion, this can be achieved by extending the `Data` base class and implement its `encode` and `decode` methods. The former projects the data into the latent space and the latter retrieves it to the dataset space. Let's take a look at how you could implement the aforementioned transformation:
+In the case of Diffusion-LM, the dataset consists of sequences of word IDs, but the diffusion process happens in the word embedding space. This means you need a way of converting sequences of word IDs into sequences of embeddings, and train the embeddings along with the Diffusion Model. In Modular Diffusion, this can be achieved by extending the `Data` base class and implement its `encode` and `decode` methods. The former projects the data into the latent space and the latter retrieves it to the dataset space. Let's take a look at how you could implement the aforementioned transform:
 
 ```python
 from diffusion.base import Data
@@ -40,7 +40,7 @@ class Embedding(Data):
 
 In the `encode` method, we are transforming the input tensor `w` into an embedding tensor using the learned embedding layer. The `decode` method reverses this operation, by finding the most similar embedding in the embedding weight matrix to each vector in `x`.
 
-Data transformations can also be useful in cases where they have no trainable parameters. For example, the `Categorical` noise module operates over one-hot vectors, which are very memory-inneficient. To mitigate this, you may store your data as a list of labels and use the `OneHot` data transformation module to transform it into one-hot vectors on a batch-by-batch basis, saving you a lot of memory. Or your data transformation can just be a frozen variational autoencoder, like in [Stable Diffusion](https://arxiv.org/abs/2112.10752). For further details, check out our [Text Generation](/modular-diffusion/guides/text-generation) and [Image Generation](/modular-diffusion/guides/image-generation) tutorials.
+Data transforms can also be useful in cases where they have no trainable parameters. For example, the `Categorical` noise module operates over one-hot vectors, which are very memory-inneficient. To mitigate this, you may store your data as a list of labels and use the `OneHot` data transform module to transform it into one-hot vectors on a batch-by-batch basis, saving you a lot of memory. Or your data transform can just be a frozen variational autoencoder, like in [Stable Diffusion](https://arxiv.org/abs/2112.10752). For further details, check out our [Text Generation](/modular-diffusion/guides/text-generation) and [Image Generation](/modular-diffusion/guides/image-generation) tutorials.
 
 ## Noise schedule
 

diff --git a/docs/src/pages/guides/getting-started.mdx b/docs/src/pages/guides/getting-started.mdx
@@ -35,7 +35,7 @@ x, _ = zip(*MNIST("data", download=True, transform=ToTensor()))
 x = torch.stack(x) * 2 - 1
 ```
 
-Let's build our Diffusion Model next. Modular Diffusion provides you with the `diffusion.Model` class, which takes as parameters a **data transformation**, a **noise schedule**, a **noise type**, a **denoiser neural network**, and a **loss function**, along with other optional parameters. You can import prebuilt components for these parameters from the different modules inside Modular Diffusion or build your own. Let's take a look at a simple example which replicates the architecture introduced in [Ho et al. (2020)](https://arxiv.org/abs/2006.11239), using only prebuilt components:
+Let's build our Diffusion Model next. Modular Diffusion provides you with the `diffusion.Model` class, which takes as parameters a **data transform**, a **noise schedule**, a **noise type**, a **denoiser neural network**, and a **loss function**, along with other optional parameters. You can import prebuilt components for these parameters from the different modules inside Modular Diffusion or build your own. Let's take a look at a simple example which replicates the architecture introduced in [Ho et al. (2020)](https://arxiv.org/abs/2006.11239), using only prebuilt components:
 
 ```python
 import diffusion
@@ -111,7 +111,7 @@ x, y = zip(*MNIST(str(input), transform=ToTensor(), download=True))
 x, y = torch.stack(x) * 2 - 1, torch.tensor(y) + 1
 ```
 
-Once again, let's assemble our Diffusion Model. This time, we will add the labels `y` in our data transformation object and provide the number of labels to our denoiser network. Let's also add classifier-free guidance to the model, a technique introduced in [Ho et al. (2022)](https://arxiv.org/abs/2207.12598) to improve sample quality in conditional generation, at the cost of extra sample time and less sample variety.
+Once again, let's assemble our Diffusion Model. This time, we will add the labels `y` in our data transform object and provide the number of labels to our denoiser network. Let's also add classifier-free guidance to the model, a technique introduced in [Ho et al. (2022)](https://arxiv.org/abs/2207.12598) to improve sample quality in conditional generation, at the cost of extra sample time and less sample variety.
 
 ```python
 from diffusion.guidance import ClassifierFree

diff --git a/...src/pages/modules/data-transformation.mdx → docs/src/pages/modules/data-transform.mdx b/...src/pages/modules/data-transformation.mdx → docs/src/pages/modules/data-transform.mdx
diff --git a/docs/src/pages/modules/loss-function.mdx b/docs/src/pages/modules/loss-function.mdx
@@ -19,7 +19,7 @@ While not a loss module, the `Batch` object is a fundamental component of Modula
 ### Properties
 
 - `w` -> Initial data tensor $w$.
-- `x` -> Data tensor after transformation $x_0$.
+- `x` -> Data tensor after transform $x_0$.
 - `y` -> Label tensor $y$.
 - `t` -> Time step tensor $t$.
 - `epsilon` -> Noise tensor $\epsilon$. May be `None` for certain noise types.

diff --git a/docs/src/pages/modules/noise-schedule.mdx b/docs/src/pages/modules/noise-schedule.mdx
@@ -22,12 +22,12 @@ Constant noise schedule given by $\alpha_t = k$.
 ```python
 from diffusion.schedule import Constant
 
-schedule = Constant(1000, 0.01)
+schedule = Constant(1000, 0.995)
 ```
 
 ### Visualization
 
-Applying `Gaussian` noise to an image using the `Constant` schedule with $T=1000$ and $k=0.01$ in equally spaced snapshots:
+Applying `Gaussian` noise to an image using the `Constant` schedule with $T=1000$ and $k=0.995$ in equally spaced snapshots:
 
 ![Image of a dog getting noisier at a constant rate.](/modular-diffusion/images/modules/noise-schedule/constant.png)
 

diff --git a/docs/src/pages/modules/noise-type.mdx b/docs/src/pages/modules/noise-type.mdx
@@ -41,9 +41,9 @@ noise = Gaussian(parameter="epsilon", variance="fixed")
 
 ### Visualization
 
-Applying `Gaussian` noise to an image using the `Linear` schedule with $T=1000$, $\alpha_0=0.9999$ and $\alpha_T=0.98$ in equally spaced snapshots:
+Applying `Gaussian` noise to an image using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
 
-![Image of a dog getting noisier at a linear rate.](/modular-diffusion/images/modules/noise-schedule/linear.png)
+![Image of a dog gradually turning noisy.](/modular-diffusion/images/modules/noise-type/gaussian.png)
 
 ## Uniform categorical noise
 
@@ -60,6 +60,10 @@ where:
 - $Q_t = \alpha_t \text{I} + (1 - \alpha_t) \mathbb{1}\mathbb{1}^T$ 
 - $\overline{Q}_{t} = \bar{\alpha}_t \text{I} + (1 - \bar{\alpha}_t) \mathbb{1}\mathbb{1}^T$
 
+> One-hot representation
+>
+> The `Uniform` noise type operates on one-hot vectors. To use it, you must use the `OneHot` data transform.
+
 ### Parameters
 
 - `k` -> Number of categories $k$.
@@ -72,6 +76,12 @@ from diffusion.noise import Uniform
 noise = Uniform(k=26)
 ```
 
+### Visualization
+
+Applying `Uniform` noise to an image with $k=255$ using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
+
+![Image of a dog gradually turning noisy.](/modular-diffusion/images/modules/noise-type/uniform.png)
+
 ## Absorbing categorical noise
 
 Absorbing categorical noise model introduced in [Austin et al. (2021)](https://arxiv.org/abs/2107.03006).
@@ -89,6 +99,10 @@ the absorbing state $m$ and 0 elsewhere.
 - $Q_t = \alpha_t \text{I} + (1 - \alpha_t) \mathbb{1}e_m^T$ 
 - $\overline{Q}_{t} = \bar{\alpha}_t \text{I} + (1 - \bar{\alpha}_t) \mathbb{1}e_m^T$
 
+> One-hot representation
+>
+> The `Absorbing` noise type operates on one-hot vectors. To use it, you must use the `OneHot` data transform.
+
 ### Parameters
 
 - `k` -> Number of categories $k$.
@@ -99,9 +113,15 @@ the absorbing state $m$ and 0 elsewhere.
 ```python
 from diffusion.noise import Uniform
 
-noise = Absorbing(k=27, m=26)
+noise = Absorbing(k=255, m=128)
 ```
 
+### Visualization
+
+Applying `Absorbing` noise to an image with $k=255$ and $m=128$ using the `Cosine` schedule with $T=1000$, $s=8e-3$ and $e=2$ in equally spaced snapshots:
+
+![Image of a dog gradually turning gray.](/modular-diffusion/images/modules/noise-type/absorbing.png)
+
 ---
 
 *If you spot any typo or technical imprecision, please submit an issue or pull request to the library's [GitHub repository](https://github.com/cabralpinto/modular-diffusion).*
diff --git a/docs/src/pages/modules/probability-distribution.mdx b/docs/src/pages/modules/probability-distribution.mdx
@@ -40,8 +40,8 @@ from diffusion.distribution import Normal as N
 
 distribution = N(torch.zeros(3), torch.full((3,), 2))
 x, epsilon = distribution.sample()
-# x = tensor([0.0000, 0.0000, 0.0000])
-# epsilon = tensor([0.0000, 0.0000, 0.0000])
+# x = tensor([ 1.1053,  1.9027, -0.2554])
+# epsilon = tensor([ 0.5527,  0.9514, -0.1277])
 ```
 
 ## Categorical distribution
@@ -60,7 +60,7 @@ from diffusion.distribution import Categorical as Cat
 
 distribution = Cat(torch.tensor([[.1, .3, .6], [0, 0, 1]]))
 x, _ = distribution.sample()
-# x = tensor([1, 2])
+# x = tensor([[0., 1., 0.], [0., 0., 1.]])
 ```
 
 > Noise tensor