diff --git a/README.md b/README.md index 2553c5b..9b0e2ff 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation This is the [**official website**](https://consistency-tta.github.io) for the paper \ -"Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation" \ +*ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation* \ from Microsoft Applied Science Group and UC Berkeley \ by [Yatong Bai](https://bai-yt.github.io), [Trung Dang](https://www.microsoft.com/applied-sciences/people/trung-dang), @@ -9,10 +9,11 @@ by [Yatong Bai](https://bai-yt.github.io), [Kazuhito Koishida](https://www.microsoft.com/applied-sciences/people/kazuhito-koishida), and [Somayeh Sojoudi](https://people.eecs.berkeley.edu/~sojoudi/). -**[[Preprint Paper](https://arxiv.org/abs/2309.10740)]**      -**[[Project Homepage](https://consistency-tta.github.io)]**      -**[[Code](https://github.com/Bai-YT/ConsistencyTTA)]**      -**[[Model Checkpoints](https://huggingface.co/Bai-YT/ConsistencyTTA)]**      +**[[Live Demo](https://huggingface.co/spaces/Bai-YT/ConsistencyTTA)]**    +**[[Preprint Paper](https://arxiv.org/abs/2309.10740)]**    +**[[Project Homepage](https://consistency-tta.github.io)]**    +**[[Code](https://github.com/Bai-YT/ConsistencyTTA)]**    +**[[Model Checkpoints](https://huggingface.co/Bai-YT/ConsistencyTTA)]**    **[[Generation Examples](https://consistency-tta.github.io/demo.html)]** @@ -35,8 +36,8 @@ single-step models stack up with previous methods, most of which mostly require ### Cite Our Work (BibTeX) ```bibtex -@article{bai2023accelerating, - title={Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation}, +@article{bai2023consistencytta, + title={ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation}, author={Bai, Yatong and Dang, Trung and Tran, Dung and Koishida, Kazuhito and Sojoudi, Somayeh}, journal={arXiv preprint arXiv:2309.10740}, year={2023} diff --git a/demo-anony.html b/demo-anony.html deleted file mode 100644 index e54a3c6..0000000 --- a/demo-anony.html +++ /dev/null @@ -1,1741 +0,0 @@ - - - - - - - - ConsistencyTTA Demo Page - - - -
-

Demo Page

-

- ConsistencyTTA: - Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
-

- -
- - - - - -
-
- -
-

This demonstration page presents the generations from 50 randomly selected prompts from the AudioCaps test set.

-

We present four audio sources: the consistency model fine-tuned with CLAP, - the consistency model without CLAP-fine-tuning, the diffusion baseline model, and the ground truth.

-

The diffusion baseline queries the neural network 400 times per audio clip, - while the consistency models query a same-sized network only one time.

-

Since the models are not trained on speech data, we do not expect them to produce meaningful speeches.

- -
-

Prompt 0

-

Whistling followed by a child giggling and then Moe whistling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 1

-

Some clanking and banging and a man speaking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 2

-

A man speaking on a microphone as a crowd of people laugh followed by dinner plates clacking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 3

-

Steam hissing followed by a train whistle blowing and a group of people talking in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 4

-

A vehicle revving and accelerating as tires skid and squeak on a road.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 5

-

Steam escapes with a hissing noise.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 6

-

A man speaking continuously.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 7

-

Knocking sounds as race cars pass by.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 8

-

A man talking followed by plastic clacking then a power tool drilling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 9

-

Humming of an engine with a woman speaking over a loudspeaker.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 10

-

A telephone ringing with loud echo.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 11

-

Released air hissing followed by a popping explosion then a metal ding persists as a person is laughing and a man is talking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 12

-

Constant hissing with mean having conversation.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 13

-

A missile launching followed by an explosion and metal screeching as a motor hums in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 14

-

An adult female speaks as a cat meows three times, and an electronic device plays in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 15

-

Food and oil sizzling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 16

-

Some light tapping on a computer keyboard and a baby crying.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 17

-

An electronic beep followed by a man talking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 18

-

Sanding and filing then a man speaks.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 19

-

An aircraft engine humming followed by plastic clanking then an aircraft engine slowing down.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 20

-

Footsteps and scuffing occur, after which a door grinds, squeaks and clicks, an adult male speaks, and the door grinds, squeaks and clicks shut with a thump.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 21

-

A train horn blowing multiple times as a train runs on railroad tracks while a man and a young kid talk in the background alongside birds cooing in the distance.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 22

-

Strong gusts of wind are followed by cheers and shouts from several people plus the chatter of girl.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 23

-

Compressed air and steam releasing with a man faintly talking in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 24

-

A man talking followed by a goat baaing then a metal gate sliding shut as ducks quack and wind blows into a microphone.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 25

-

A cat is meowing.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 26

-

A toilet is flushing followed by a cat meowing.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 27

-

A person speaks with distant humming and nearby clinking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 28

-

A dog whimpering followed by laughing and barking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 29

-

A vehicle driving by with tires briefly skidding and accelerating then slowing down.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 30

-

A horn and then an engine revving.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 31

-

Several people cheer and scream and speak as water flows hard.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 32

-

A person whistles to music.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 33

-

Laughing and speech in a slowed speed.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 34

-

A man speaking as insects are buzzing and wind is blowing into a microphone.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 35

-

Wind followed by splashing of water.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 36

-

A person whistling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 37

-

Wood being scraped along with mechanical sounds.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 38

-

A woman speeches.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 39

-

A cat is meowing in a quiet environment.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 40

-

Wind blowing and a siren rings.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 41

-

Static and beeping.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 42

-

Musical whistling with wind blowing.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 43

-

An idle motorbike engine running.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 44

-

A jackhammer drilling and vibrating continuously.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 45

-

A train is passing by and sound its whistle.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 46

-

A motorboat engine running as water splashes and a man shouts followed by birds chirping in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 47

-

A high frequency motor hums loudly and splashes water.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 48

-

A series of sharp, squeaky snoring noises.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 49

-

A bus horn honking as wind is blowing into a microphone before a bus drives by.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
- - - - - diff --git a/demo.html b/demo.html index 1ace7dc..14d263e 100644 --- a/demo.html +++ b/demo.html @@ -27,6 +27,9 @@

ConsistencyTTA: + + + diff --git a/diversity-anony.html b/diversity-anony.html deleted file mode 100644 index 0d888bb..0000000 --- a/diversity-anony.html +++ /dev/null @@ -1,1744 +0,0 @@ - - - - - - - - ConsistencyTTA Diversity - - - -
-

Generation Diversity

-

- ConsistencyTTA: - Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
-

- -
- - - - - -
-
- -
-

This demonstration page presents the generation diversity of the proposed consistency TTA model. - The generations correspond to the first 50 AudioCaps test prompts, - and are from our consistency model with four different random seeds.

-

For quantitative evidence, we standardize each generated Mel spectrogram, - calculate the standard deviation across different seeds, - and average the standard deviation across all Mel spectrogram points of the 50 examples. - The averaged number is 0.871, demonstrating non-trivial generation diversity.

-

Please listen to the following audio clips to confirm the generation quality of these seeds. - Since the model are not trained on speech data, we do not expect it to produce meaningful speech.

- -
-

Prompt 0

-

A machine is making clicking sound as people talk in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 1

-

A missile launching followed by an explosion and metal screeching as a motor hums in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 2

-

A toy train running as a young boy talks followed by plastic clanking then a child laughing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 3

-

Clattering of a train is ongoing, a railroad crossing bell rings, and a train horn blows.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 4

-

Food sizzling with some knocking and banging followed by a woman speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 5

-

A man talks while several animals make noises in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 6

-

An emergency siren ringing with car horn honking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 7

-

An infant yelling as a young boy talks while a hard surface is slapped several times.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 8

-

A bus engine running followed by a bus horn honking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 9

-

A man speaking followed by snoring.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 10

-

Rolling thunder with lightning strikes.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 11

-

A woman and a baby are having a conversation.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 12

-

Water trickling with man speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 13

-

Female speech, a toilet flushing and then more speech.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 14

-

Loud high humming and croaking sound.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 15

-

A cuckoo bird coos followed by a train running on railroad tracks as a bell dings in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 16

-

A man talking then meowing and hissing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 17

-

Water flowing through pipes.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 18

-

An infant crying followed by a man laughing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 19

-

A man speaking, followed by a door shutting, and then the man speaks some more.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 20

-

The wind is blowing, and a person is whistling a tune.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 21

-

Motor vehicles are driving with loud engines and a person whistles.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 22

-

Bubbles gurgling and water spraying as a man speaks softly while crowd of people talk in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 23

-

Metal clacking followed by a man talking then a metal bang as footsteps shuffle on dirt and a group of men laugh.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 24

-

Ducks quack and water splashes with some animal screeching in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 25

-

Multiple gun shots woman screaming.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 26

-

An aircraft engine runs and vibrates, metal spinning and grinding occur, and the engine accelerates and fades into the distance.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 27

-

A man is talking as tap water is running.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 28

-

Woman speaking, plastic container opening.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 29

-

A male speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 30

-

A vehicle engine revving followed by tires skidding as a group of people talk in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 31

-

A woman talking followed by a plate rattling as food and oil sizzle.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 32

-

Humming of an idling engine.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 33

-

A train running on railroad tracks as a train horn whistle blows several times while railroad crossing warning signals are ringing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 34

-

Several varying hisses.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 35

-

A motorboat driving by as water splashes followed by wind blowing into a microphone.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 36

-

A bus engine slowing down then accelerating.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 37

-

A woman talks as a baby cries.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 38

-

Kids laughing then talking followed by a young man talking as wind blows into a microphone.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 39

-

A woman delivers a speech.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 40

-

Clicking followed by humming noise.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 41

-

Electronic beeping followed by a cat singing then meowing as paper shuffles and a man talks with music playing in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 42

-

A high frequency motor hums loudly and splashes water.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 43

-

An adult male speaks, followed by another adult male speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 44

-

A horn and then an engine revving.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 45

-

Man speaking while insects buzz around.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 46

-

A motorboat engine running as water splashes and a man shouts followed by birds chirping in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 47

-

A man speaks and a machine runs with a continued speech.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 48

-

Man speaks followed by whistling.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 49

-

Warning bells ring and a train passes with a honking horn.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
- - - - - diff --git a/diversity.html b/diversity.html index a7450d7..393aea6 100644 --- a/diversity.html +++ b/diversity.html @@ -27,6 +27,9 @@

ConsistencyTTA: + + + diff --git a/evaluation-anony.html b/evaluation-anony.html deleted file mode 100644 index 28c76e1..0000000 --- a/evaluation-anony.html +++ /dev/null @@ -1,2762 +0,0 @@ - - - - - ConsistencyTTA Human Eval - - - - - - -
-

Example Human Evaluation Form

- - ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

- -

- -
-

Criteria for overall audio quality

- - The quality of each rating is:

- - 5 - Excellent.
- 4 - Overall slightly synthetic.
- 3 - Clearly synthetic but recognizable.
- 2 - Unclear/unidentifiable sound.
- 1 - Completely unrecognizable.

- - Since the generative models were not trained on speech data, - they are expected to generate unintelligible speech. - Therefore, please DO NOT consider the intelligibility of speech as a part of the criteria - (the voice quality can be taken into consideration).
- -

Criteria for audio-text correspondence

- - The quality of each rating is:

- - 5 - Excellent.
- 4 - Temporal mismatch or other slight mismatches. - E.g., the prompt says one sound after another, but the audio has them simultaneously.
- 3 - One of the sound components missing/redundant/incorrect. - E.g. the prompt requests four sound components, but the audio only has three or vice versa; - the prompt asks for one persor speaking but there are two people in the audio.
- 2 - Missing/redundant/incorrect more than one components.
- 1 - Totally incorrect.
- -

Before starting the rating, clear the browser local storage using the following button.

- - -

After completing the ratings, click the following button to download the data into a CSV.

- There is also a copy of this button at the bottom of the page.

-
- -
-

Prompt 0

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 1

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 2

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 3

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 4

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 5

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 6

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 7

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 8

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 9

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 10

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 11

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 12

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 13

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 14

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 15

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 16

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 17

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 18

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 19

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 20

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 21

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 22

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 23

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 24

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 25

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 26

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 27

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 28

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 29

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 30

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 31

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 32

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 33

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 34

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 35

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 36

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 37

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 38

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 39

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 40

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 41

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 42

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 43

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 44

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 45

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 46

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 47

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 48

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 49

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 50

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 51

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 52

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 53

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 54

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 55

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 56

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 57

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 58

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 59

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 60

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 61

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 62

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 63

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 64

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 65

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 66

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 67

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 68

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 69

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 70

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 71

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 72

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 73

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 74

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 75

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 76

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 77

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 78

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 79

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 80

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 81

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 82

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 83

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 84

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 85

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 86

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 87

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 88

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 89

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 90

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 91

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 92

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 93

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 94

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 95

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 96

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 97

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 98

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 99

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

The ratings can be downloaded into a CSV file.

- -
- - - - diff --git a/evaluation.html b/evaluation.html index 45c0cff..bedbb0a 100644 --- a/evaluation.html +++ b/evaluation.html @@ -15,7 +15,7 @@

Example Human Evaluation Form

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi

-

+


Criteria for overall audio quality

diff --git a/index-anony.html b/index-anony.html deleted file mode 100644 index 246a068..0000000 --- a/index-anony.html +++ /dev/null @@ -1,110 +0,0 @@ - - - - - - - - - ConsistencyTTA - - - - - - -
-

ConsistencyTTA: - Accelerating Diffusion-Based
- Text-to-Audio Generation with Consistency Distillation
-

- -
- - - - - -
-
- -
-

Description

-

- Diffusion models power a vast majority of the text-to-audio generation methods. - Unfortunately, diffusion models suffer from a slow inference speed due to iteratively querying the - underlying denoising network, thus unsuitable for applications with time or computational constraints. - This work modifies the recently proposed "consistency distillation" framework to train text-to-audio - models that only require a single neural network query, accelerating the generation hundreds of times. -

-

- By incorporating classifier-free guidance into the distillation framework, our models retain - diffusion models' impressive generation quality and diversity. Furthermore, the non-recurrent - differentiable structure resulting from the distillation allows fine-tuning with novel loss functions. - We use the CLAP loss as an example, confirming that end-to-end fine-tuning further boosts the generation quality. -

-
- -
-

Main Experiment Results

-
- ConsistencyTTA Results -
-

- Our method reduce the computation of the core step of diffusion-based text-to-audio generation by - a factor of 400, while observing minimal performance degradation in terms of - Fréchet Audio Distance (FAD), Fréchet Distance (FD), KL Divergence, and CLAP Scores. -

- - - - - - - - - - - - - - - - - - - - - - - - - -
# queries (↓)CLAPT (↑) CLAPA (↑)FAD (↓) FD (↓) KLD (↓)
Diffusion (Baseline) 40024.57 72.791.908 19.57 1.350
Consistency + CLAP FT (Ours) 124.69 72.542.406 20.97 1.358
Consistency (Ours) 122.50 72.302.575 22.08 1.354
-
- -
-

Generation Diversity

-

- Consistency models demonstrate non-trivial generation diversity, as do diffusion models. - In this page, we present 50 groups of generations from - four different random seeds to demonstrate this diversity, showing that our method - combines the diversity of diffusion models and the efficiency of single-step models. -

-
- -
-

Human Evaluation

-

- ConsistencyTTA's performance is verified via extensive human evaluation. - Audio clips generated from ConsistencyTTA and baseline methods are mixed and shown to the evaluators, - who are then asked to rate the audio clips based on their quality and correspondence with the textual prompt. - A sample of the evaluation form is shown on this page. -

-
- - - - - - diff --git a/index.html b/index.html index f5157e5..e6ff8aa 100644 --- a/index.html +++ b/index.html @@ -40,10 +40,13 @@

- + + + + diff --git a/styles.css b/styles.css index e9b6abb..e24612c 100644 --- a/styles.css +++ b/styles.css @@ -33,7 +33,7 @@ header h3 { .institution { font-weight: 200; - font-size: 1em; + font-size: 1.1em; font-style: italic; margin-top: -.4em; } @@ -102,24 +102,38 @@ button { } .home-button { - background-color: #597a47; + background-color: #5d5d5d; box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); margin: 0px 13px; - width: 180px; + width: 130px; + height: 70px; + vertical-align: middle; +} + +.home-button-wide { + background-color: #5d5d5d; + box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); + margin: 0px 13px; + width: 200px; + vertical-align: middle; } .demo-button { background-color: #7a5947; box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); margin: 0px 13px; - width: 180px; + width: 130px; + height: 70px; + vertical-align: middle; } .hf-button { background-color: #c1436d; box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); margin: 0px 13px; - width: 200px; + width: 130px; + height: 70px; + vertical-align: middle; } .eval-button { @@ -134,16 +148,31 @@ button { text-shadow: 3px 3px 1.5px rgba(0, 0, 0, 0.15); box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); margin: 0px 13px; - width: 180px; + width: 130px; + height: 70px; + vertical-align: middle; } -.code-button { +.livedemo-button { background-color: #df801d; font-weight: bold; text-shadow: 3px 3px 1.5px rgba(0, 0, 0, 0.15); box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); margin: 0px 13px; - width: 160px; + width: 140px; + height: 70px; + vertical-align: middle; +} + +.code-button { + background-color: #18a74c; + font-weight: bold; + text-shadow: 3px 3px 1.5px rgba(0, 0, 0, 0.15); + box-shadow: 0 0 15px rgba(0, 0, 0, 0.1); + margin: 0px 13px; + width: 130px; + height: 70px; + vertical-align: middle; } a .fab.fa-github {