diff --git a/index.html b/index.html index 49df88a..ee6f7b6 100644 --- a/index.html +++ b/index.html @@ -78,7 +78,8 @@

Description

We use the CLAP loss as an example, confirming that end-to-end fine-tuning further boosts the generation quality.

- Please join us at INTERSPEECH 2024 at Kos Island, Greece! + Please check out our poster at + INTERSPEECH 2024 at Kos Island, Greece!

@@ -113,40 +114,36 @@

Main Experiment Results

- AudioLDM-L (Baseline) - 400 - - + AudioLDM-L (Baseline) 400 + - - - - - - 2.08 27.12 - 1.86 + 2.08 27.12 1.86 - TANGO (Baseline) + TANGO (Baseline) 400 168 4.136 4.064 - 24.10 72.85 - 1.631 20.11 - 1.362 + 24.10 72.85 + 1.631 20.11 1.362 - ConsistencyTTA + CLAP-FT + ConsistencyTTA + CLAP-FT 1 2.3 3.830 4.064 - 24.69 72.54 - 2.406 20.97 - 1.358 + 24.69 72.54 + 2.406 20.97 1.358 - ConsistencyTTA + ConsistencyTTA 1 2.3 - 3.902 4.010 + 3.902 4.010 22.50 72.30 2.575 22.08 1.354 - Ground Truth - - - - - - + Ground Truth - + - - - 26.71 100 - - - @@ -155,7 +152,90 @@

Main Experiment Results

This benchmark demonstrates how our single-step models stack up with previous methods, - most of which mostly require hundreds of generation steps. + most of which requiring hundreds of generation steps. +

+ + +
+

Ablation Studies on Distillation Settings

+

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Guidance MethodCFG WeightTeacher SolverNoise ScheduleFAD ↓FD ↓KLD ↓
Unguided1DDIMUniform13.4845.752.409
External CFG3DDIMUniform8.56538.672.015
HeunKarras7.42139.361.976
CFG Distillation
with Fixed Weight
3HeunKarras5.70233.181.494
Uniform3.85927.791.421
CFG Distillation
with Random Weight
4HeunUniform3.18027.921.394
62.97528.631.378
+ Based on these results, we can conclude that: +

@@ -183,11 +263,11 @@

Human Evaluation

Citing Our Work (BibTeX)

-
@article{bai2023accelerating,
+
@inproceedings{bai2024accelerating,
   author = {Bai, Yatong and Dang, Trung and Tran, Dung and Koishida, Kazuhito and Sojoudi, Somayeh},
-  title = {Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation},
-  journal={arXiv preprint arXiv:2309.10740},
-  year = {2023}
+  title = {ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation},
+  booktitle = {INTERSPEECH},
+  year = {2024}
 }
diff --git a/poster.pdf b/poster.pdf index 90c2bf6..3c7f415 100644 Binary files a/poster.pdf and b/poster.pdf differ diff --git a/styles.css b/styles.css index 511bdab..ca1d616 100644 --- a/styles.css +++ b/styles.css @@ -292,17 +292,31 @@ tr td:last-child { padding: 7px 12px; border-bottom: 1px solid #e7ebef; background-color: #dfe3f241; - font-size: 1.2em; + font-size: 1.15em; +} +.result-data-400 { + padding: 7px 12px; + border-bottom: 1px solid #e7ebef; + background-color: #dfe3f241; + font-size: 1.15em; + font-weight: 400; } .result-data-2 { padding: 7px 12px; border-bottom: 1px solid #e7ebef; - font-size: 1.2em; + font-size: 1.15em; +} +.result-data-2-400 { + padding: 7px 12px; + border-bottom: 1px solid #e7ebef; + font-size: 1.15em; + font-weight: 400; } .result-data-small { padding: 7px 12px; border-bottom: 1px solid #e7ebef; background-color: #dfe3f241; + font-weight: 400; } /* Optional: Add transitions for smoother hover effects */