diff --git a/demo.html b/demo.html index 3929904..3fdf947 100644 --- a/demo.html +++ b/demo.html @@ -37,6 +37,7 @@
The diffusion baseline queries the neural network 400 times per audio clip, while the consistency models query a same-sized network only one time.
+Since the models are not trained on speech data, we do not expect them to produce meaningful speeaches.