diff --git a/demo.html b/demo.html index 3929904..3fdf947 100644 --- a/demo.html +++ b/demo.html @@ -37,6 +37,7 @@

Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Di the consistency model without CLAP-fine-tuning, the diffusion baseline model, and the ground truth.

The diffusion baseline queries the neural network 400 times per audio clip, while the consistency models query a same-sized network only one time.

+

Since the models are not trained on speech data, we do not expect them to produce meaningful speeaches.


Prompt 0

diff --git a/report.pdf b/report.pdf index df8424a..2284fe4 100644 Binary files a/report.pdf and b/report.pdf differ