From e5749be6beda91160b1726422a419f287bfecb63 Mon Sep 17 00:00:00 2001 From: Bai-YT Date: Sun, 17 Nov 2024 00:56:01 -0800 Subject: [PATCH] Remove anonymous pages --- demo-anony.html | 1740 -------------------------- demo.html | 2 +- diversity-anony.html | 1743 -------------------------- diversity.html | 2 +- evaluation-anony.html | 2762 ----------------------------------------- index-anony.html | 109 -- index.html | 4 +- styles.css | 6 + 8 files changed, 10 insertions(+), 6358 deletions(-) delete mode 100644 demo-anony.html delete mode 100644 diversity-anony.html delete mode 100644 evaluation-anony.html delete mode 100644 index-anony.html diff --git a/demo-anony.html b/demo-anony.html deleted file mode 100644 index 50bd11b..0000000 --- a/demo-anony.html +++ /dev/null @@ -1,1740 +0,0 @@ - - - - - - - - ConsistencyTTA Demo Page - - - -
-

Demo Page

-

- ConsistencyTTA: - Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
-

- -
- - - - -
-
- -
-

This demonstration page presents the generations from 50 randomly selected prompts from the AudioCaps test set.

-

We present four audio sources: the consistency model fine-tuned with CLAP, - the consistency model without CLAP-fine-tuning, the diffusion baseline model, and the ground truth.

-

The diffusion baseline queries the neural network 400 times per audio clip, - while the consistency models query a same-sized network only one time.

-

Since the models are not trained on speech data, we do not expect them to produce meaningful speeches.

- -
-

Prompt 0

-

Whistling followed by a child giggling and then Moe whistling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 1

-

Some clanking and banging and a man speaking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 2

-

A man speaking on a microphone as a crowd of people laugh followed by dinner plates clacking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 3

-

Steam hissing followed by a train whistle blowing and a group of people talking in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 4

-

A vehicle revving and accelerating as tires skid and squeak on a road.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 5

-

Steam escapes with a hissing noise.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 6

-

A man speaking continuously.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 7

-

Knocking sounds as race cars pass by.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 8

-

A man talking followed by plastic clacking then a power tool drilling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 9

-

Humming of an engine with a woman speaking over a loudspeaker.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 10

-

A telephone ringing with loud echo.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 11

-

Released air hissing followed by a popping explosion then a metal ding persists as a person is laughing and a man is talking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 12

-

Constant hissing with mean having conversation.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 13

-

A missile launching followed by an explosion and metal screeching as a motor hums in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 14

-

An adult female speaks as a cat meows three times, and an electronic device plays in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 15

-

Food and oil sizzling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 16

-

Some light tapping on a computer keyboard and a baby crying.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 17

-

An electronic beep followed by a man talking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 18

-

Sanding and filing then a man speaks.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 19

-

An aircraft engine humming followed by plastic clanking then an aircraft engine slowing down.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 20

-

Footsteps and scuffing occur, after which a door grinds, squeaks and clicks, an adult male speaks, and the door grinds, squeaks and clicks shut with a thump.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 21

-

A train horn blowing multiple times as a train runs on railroad tracks while a man and a young kid talk in the background alongside birds cooing in the distance.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 22

-

Strong gusts of wind are followed by cheers and shouts from several people plus the chatter of girl.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 23

-

Compressed air and steam releasing with a man faintly talking in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 24

-

A man talking followed by a goat baaing then a metal gate sliding shut as ducks quack and wind blows into a microphone.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 25

-

A cat is meowing.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 26

-

A toilet is flushing followed by a cat meowing.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 27

-

A person speaks with distant humming and nearby clinking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 28

-

A dog whimpering followed by laughing and barking.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 29

-

A vehicle driving by with tires briefly skidding and accelerating then slowing down.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 30

-

A horn and then an engine revving.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 31

-

Several people cheer and scream and speak as water flows hard.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 32

-

A person whistles to music.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 33

-

Laughing and speech in a slowed speed.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 34

-

A man speaking as insects are buzzing and wind is blowing into a microphone.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 35

-

Wind followed by splashing of water.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 36

-

A person whistling.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 37

-

Wood being scraped along with mechanical sounds.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 38

-

A woman speeches.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 39

-

A cat is meowing in a quiet environment.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 40

-

Wind blowing and a siren rings.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 41

-

Static and beeping.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 42

-

Musical whistling with wind blowing.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 43

-

An idle motorbike engine running.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 44

-

A jackhammer drilling and vibrating continuously.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 45

-

A train is passing by and sound its whistle.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 46

-

A motorboat engine running as water splashes and a man shouts followed by birds chirping in the background.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 47

-

A high frequency motor hums loudly and splashes water.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 48

-

A series of sharp, squeaky snoring noises.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
-

Prompt 49

-

A bus horn honking as wind is blowing into a microphone before a bus drives by.

- - - - - - - - - - - - - - - - - -
ConsistencyTTA (ours);
ConsistencyTTA + CLAP-FT (ours) 
Diffusion baseline (TANGO)
Ground truth
- -
- - - - - diff --git a/demo.html b/demo.html index 14d263e..b3d653c 100644 --- a/demo.html +++ b/demo.html @@ -25,7 +25,7 @@

ConsistencyTTA: - + diff --git a/diversity-anony.html b/diversity-anony.html deleted file mode 100644 index 4e73280..0000000 --- a/diversity-anony.html +++ /dev/null @@ -1,1743 +0,0 @@ - - - - - - - - ConsistencyTTA Diversity - - - -
-

Generation Diversity

-

- ConsistencyTTA: - Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
-

- -
- - -
- -
-
- -
-

This demonstration page presents the generation diversity of the proposed consistency TTA model. - The generations correspond to the first 50 AudioCaps test prompts, - and are from our consistency model with four different random seeds.

-

For quantitative evidence, we standardize each generated Mel spectrogram, - calculate the standard deviation across different seeds, - and average the standard deviation across all Mel spectrogram points of the 50 examples. - The averaged number is 0.871, demonstrating non-trivial generation diversity.

-

Please listen to the following audio clips to confirm the generation quality of these seeds. - Since the model are not trained on speech data, we do not expect it to produce meaningful speech.

- -
-

Prompt 0

-

A machine is making clicking sound as people talk in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 1

-

A missile launching followed by an explosion and metal screeching as a motor hums in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 2

-

A toy train running as a young boy talks followed by plastic clanking then a child laughing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 3

-

Clattering of a train is ongoing, a railroad crossing bell rings, and a train horn blows.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 4

-

Food sizzling with some knocking and banging followed by a woman speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 5

-

A man talks while several animals make noises in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 6

-

An emergency siren ringing with car horn honking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 7

-

An infant yelling as a young boy talks while a hard surface is slapped several times.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 8

-

A bus engine running followed by a bus horn honking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 9

-

A man speaking followed by snoring.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 10

-

Rolling thunder with lightning strikes.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 11

-

A woman and a baby are having a conversation.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 12

-

Water trickling with man speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 13

-

Female speech, a toilet flushing and then more speech.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 14

-

Loud high humming and croaking sound.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 15

-

A cuckoo bird coos followed by a train running on railroad tracks as a bell dings in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 16

-

A man talking then meowing and hissing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 17

-

Water flowing through pipes.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 18

-

An infant crying followed by a man laughing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 19

-

A man speaking, followed by a door shutting, and then the man speaks some more.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 20

-

The wind is blowing, and a person is whistling a tune.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 21

-

Motor vehicles are driving with loud engines and a person whistles.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 22

-

Bubbles gurgling and water spraying as a man speaks softly while crowd of people talk in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 23

-

Metal clacking followed by a man talking then a metal bang as footsteps shuffle on dirt and a group of men laugh.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 24

-

Ducks quack and water splashes with some animal screeching in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 25

-

Multiple gun shots woman screaming.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 26

-

An aircraft engine runs and vibrates, metal spinning and grinding occur, and the engine accelerates and fades into the distance.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 27

-

A man is talking as tap water is running.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 28

-

Woman speaking, plastic container opening.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 29

-

A male speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 30

-

A vehicle engine revving followed by tires skidding as a group of people talk in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 31

-

A woman talking followed by a plate rattling as food and oil sizzle.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 32

-

Humming of an idling engine.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 33

-

A train running on railroad tracks as a train horn whistle blows several times while railroad crossing warning signals are ringing.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 34

-

Several varying hisses.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 35

-

A motorboat driving by as water splashes followed by wind blowing into a microphone.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 36

-

A bus engine slowing down then accelerating.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 37

-

A woman talks as a baby cries.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 38

-

Kids laughing then talking followed by a young man talking as wind blows into a microphone.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 39

-

A woman delivers a speech.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 40

-

Clicking followed by humming noise.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 41

-

Electronic beeping followed by a cat singing then meowing as paper shuffles and a man talks with music playing in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 42

-

A high frequency motor hums loudly and splashes water.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 43

-

An adult male speaks, followed by another adult male speaking.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 44

-

A horn and then an engine revving.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 45

-

Man speaking while insects buzz around.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 46

-

A motorboat engine running as water splashes and a man shouts followed by birds chirping in the background.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 47

-

A man speaks and a machine runs with a continued speech.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 48

-

Man speaks followed by whistling.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
-

Prompt 49

-

Warning bells ring and a train passes with a honking horn.

- - - - - - - - - - - - - - - - - -
Seed I
Seed II
Seed III
Seed IV
- -
- -
- - - diff --git a/diversity.html b/diversity.html index 393aea6..0187ef3 100644 --- a/diversity.html +++ b/diversity.html @@ -25,7 +25,7 @@

ConsistencyTTA: - + diff --git a/evaluation-anony.html b/evaluation-anony.html deleted file mode 100644 index 28c76e1..0000000 --- a/evaluation-anony.html +++ /dev/null @@ -1,2762 +0,0 @@ - - - - - ConsistencyTTA Human Eval - - - - - - -
-

Example Human Evaluation Form

- - ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

- -


- -
-

Criteria for overall audio quality

- - The quality of each rating is:

- - 5 - Excellent.
- 4 - Overall slightly synthetic.
- 3 - Clearly synthetic but recognizable.
- 2 - Unclear/unidentifiable sound.
- 1 - Completely unrecognizable.

- - Since the generative models were not trained on speech data, - they are expected to generate unintelligible speech. - Therefore, please DO NOT consider the intelligibility of speech as a part of the criteria - (the voice quality can be taken into consideration).
- -

Criteria for audio-text correspondence

- - The quality of each rating is:

- - 5 - Excellent.
- 4 - Temporal mismatch or other slight mismatches. - E.g., the prompt says one sound after another, but the audio has them simultaneously.
- 3 - One of the sound components missing/redundant/incorrect. - E.g. the prompt requests four sound components, but the audio only has three or vice versa; - the prompt asks for one persor speaking but there are two people in the audio.
- 2 - Missing/redundant/incorrect more than one components.
- 1 - Totally incorrect.
- -

Before starting the rating, clear the browser local storage using the following button.

- - -

After completing the ratings, click the following button to download the data into a CSV.

- There is also a copy of this button at the bottom of the page.

-
- -
-

Prompt 0

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 1

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 2

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 3

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 4

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 5

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 6

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 7

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 8

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 9

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 10

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 11

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 12

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 13

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 14

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 15

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 16

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 17

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 18

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 19

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 20

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 21

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 22

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 23

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 24

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 25

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 26

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 27

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 28

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 29

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 30

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 31

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 32

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 33

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 34

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 35

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 36

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 37

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 38

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 39

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 40

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 41

-

Mechanical rotation and then a loud click occurs

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 42

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 43

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 44

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 45

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 46

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 47

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 48

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 49

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 50

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 51

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 52

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 53

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 54

-

A man speaking while water runs in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 55

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 56

-

Several puppies yapping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 57

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 58

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 59

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 60

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 61

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 62

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 63

-

A woman speaks, and a motor vehicle revs its engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 64

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 65

-

Train passing followed by short honk

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 66

-

Rain and thunder

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 67

-

A bus engine driving in the distance then nearby followed by compressed air releasing while a woman and a child talk in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 68

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 69

-

An electric motor runs then a person speaks

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 70

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 71

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 72

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 73

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 74

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 75

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 76

-

A vehicle accelerating then driving by as gusts of wind blow and leaves rustle in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 77

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 78

-

High pressure liquid spraying as a radio plays in the background

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 79

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 80

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 81

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 82

-

A machine motor running as a man is speaking followed by rapid buzzing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 83

-

A loud bang followed by an engine idling loudly

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 84

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 85

-

Male speech and then scraping

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 86

-

An baby laughing

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 87

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 88

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 89

-

Humming from a large engine

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 90

-

A nearby insect buzzes with nearby vibrations

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 91

-

A motor vehicle engine is revving

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 92

-

A car engine idling then starts to rev shortly after

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 93

-

A helicopter engine operating while wind blows heavily into a microphone

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 94

-

A horse gallops then trot on grass as gusts of wind blow and thunderclaps in the distance

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 95

-

A man talking followed by a camera muffling and footsteps shuffling then wood lightly clanking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 96

-

A sewing machine sews followed by a man talking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 97

-

A person gulping followed by glass tapping then liquid shaking in a container proceeded by liquid pouring before plastic thumps on paper

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 98

-

Man talking in the wind and someone yells in the background while an engine makes squealing and air puffing sounds

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

Prompt 99

-

A woman talks briefly as several goats bleat including one that has high pitched bleats. A crunch is followed by a man speaking

- - -

Rate on overall audio quality.

-
- - - - - -
-

- -

Rate on audio-text correspondence.

-
- - - - - -
-

- -
-

The ratings can be downloaded into a CSV file.

- -
- - - - diff --git a/index-anony.html b/index-anony.html deleted file mode 100644 index 4c76f24..0000000 --- a/index-anony.html +++ /dev/null @@ -1,109 +0,0 @@ - - - - - - - - - ConsistencyTTA - - - - - - -
-

ConsistencyTTA: - Accelerating Diffusion-Based
- Text-to-Audio Generation with Consistency Distillation
-

- -
- - - - -
-
- -
-

Description

-

- Diffusion models power a vast majority of the text-to-audio generation methods. - Unfortunately, diffusion models suffer from a slow inference speed due to iteratively querying the - underlying denoising network, thus unsuitable for applications with time or computational constraints. - This work proposes text-to-audio models that only require a single non-autoregressive neural network - query, accelerating the generation hundreds of times and enabling on-device audio generation. -

-

- By incorporating classifier-free guidance into the distillation framework, our models retain - diffusion models' impressive generation quality and diversity. Furthermore, the non-recurrent - differentiable structure resulting from the distillation allows fine-tuning with novel loss functions. - We use the CLAP loss as an example, confirming that end-to-end fine-tuning further boosts the generation quality. -

-
- -
-

Main Experiment Results

-
- ConsistencyTTA Results -
-

- Our method reduce the computation of the core step of diffusion-based text-to-audio generation by - a factor of 400, while observing minimal performance degradation in terms of - Fréchet Audio Distance (FAD), Fréchet Distance (FD), KL Divergence, and CLAP Scores. -

- - - - - - - - - - - - - - - - - - - - - - - - - -
# queries (↓)CLAPT (↑) CLAPA (↑)FAD (↓) FD (↓) KLD (↓)
Diffusion (Baseline) 40024.57 72.791.908 19.57 1.350
Consistency + CLAP FT (Ours) 124.69 72.542.406 20.97 1.358
Consistency (Ours) 122.50 72.302.575 22.08 1.354
-
- -
-

Generation Diversity

-

- Consistency models demonstrate non-trivial generation diversity, as do diffusion models. - In this page, we present 50 groups of generations from - four different random seeds to demonstrate this diversity, showing that our method - combines the diversity of diffusion models and the efficiency of single-step models. -

-
- -
-

Human Evaluation

-

- ConsistencyTTA's performance is verified via extensive human evaluation. - Audio clips generated from ConsistencyTTA and baseline methods are mixed and shown to the evaluators, - who are then asked to rate the audio clips based on their quality and correspondence with the textual prompt. - A sample of the evaluation form is shown on this page. -

-
- -
- - - - diff --git a/index.html b/index.html index ee6f7b6..dc83895 100644 --- a/index.html +++ b/index.html @@ -44,7 +44,7 @@

- + @@ -263,7 +263,7 @@

Human Evaluation

Citing Our Work (BibTeX)

-
@inproceedings{bai2024accelerating,
+
@inproceedings{bai2024consistencytta,
   author = {Bai, Yatong and Dang, Trung and Tran, Dung and Koishida, Kazuhito and Sojoudi, Somayeh},
   title = {ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation},
   booktitle = {INTERSPEECH},
diff --git a/styles.css b/styles.css
index ca1d616..7f667eb 100644
--- a/styles.css
+++ b/styles.css
@@ -7,6 +7,9 @@
 @import url(
     'https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.1.0/css/all.min.css'
 );
+@import url(
+    "https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css"
+);
 
 
 body {
@@ -197,6 +200,9 @@ a .fab.fa-github {
     font-size: 24px; /* adjust size as needed */
     margin: 0px 7px;
 }
+a .ai-arxiv {
+    color: #E7352B; /* Set your desired color (arXiv red) */
+}
 
 .eval-button-small {
     background-color: #5d5d5d;