Before proceeding, it is recommended that you read this note on visualizations and this note on the diffusion restorator to better understand the following.
All visualizations for the diffusion restorator
have been generated using the most recent model weights from this run.
Just out of curiosity, let's examine and compare the samplers used in the original DDPM and DDIM models. In terms of DDIM terminology, the original DDPM sampler can be described as follows: K=1, noise projection = false, stochasticity = 1.0, noise stddev = normal
. On the other hand, the original DDIM sampler settings are: K=1, noise projection = false, stochasticity = 0.0, noise stddev = normal
.
In essence, the primary distinction between the original settings of DDPM and DDIM lies in the stochasticity
parameter. Now, let's delve into how this parameter influences the outcomes.
Both setups have captured the overall structure of the images well, but their results contain too much noise. The images generated by DDIM appear cleaner, but still of poor quality. As I mentioned in this note, the model is not sufficiently well-trained, resulting in unsatisfactory outputs. Of course, one could endlessly increase the number of training epochs and the model's size, but let's leave that to the other researchers.
The following setups were used for visualization:
- Basic+masking-32 (loss: 0.00639,
single-pass restorator
) - Basic+masking-8 (loss: 0.00638,
single-pass restorator
) - Basic
diffusion restorator
K=1, noise projection = true, stochasticity = 1.0, noise stddev = normal
(furtherA
) (loss: 0.00646)K=8, noise projection = true, stochasticity = 1.0, noise stddev = normal
(furtherB
) (loss: 0.00637)
As you can notice, both setups of the diffusion restorator
now utilize noise projection = true
. This significantly improves the quality of the output. Additionally, setup B
operates 8 times faster than A
while achieving a better loss. Many other setups with similar parameters also exhibit similar loss values and generate nearly identical results. Therefore, only setups A
and B
will be used for visualization.
The differences may be minor, yet observable. Visually, the images restored using the diffusion restorator
appear slightly crisper and more refined. Additionally, the colors exhibit a greater naturalness and smoothness. It's an intriguing outcome, considering that the single-pass restorator
required the incorporation of masking augmentation during the training process to capture the interplay between different regions. In contrast, the diffusion restorator
was trained without any augmentations.
To highlight the differences, I created a script that searches for regions with the highest statistical variances between images. Here are the results:
Unfortunately, the differences between setups A
and B
are hardly noticeable. While this is not ideal for visualization purposes, it is actually beneficial in terms of performance. Setup B
operates eight times faster while maintaining identical restoration quality.
(To understand the concept of local region restoration, please refer to the following note.)
Differences are quite significant and don't require additional comments. Setups A
and B
show almost identical results, but when considering the original images (sized 1024x1024), it can be noticed that setup B
renders more blurred images. This is because setup B
uses a lower number of diffusion steps compared to setup A
, and it doesn't have enough time to restore the details.