Skip to content

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

License

Notifications You must be signed in to change notification settings

usc-sail/SynthAudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Synth Audio

This repo includes part of the code for IS24 paper: Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling? [Paper Link]

The core idea behind this work is simple, and we are aiming to answer whether we can use synthetic audio as training and augmentation data?

The major part of the work is audio generation, so we recommend to use the code under src/audio_gen

Audio Generation

We provide code to generate the audio used in our paper: ESC50, GTZAN, and UCF101.

src/audio_gen/audio_gen_esc50.py
src/audio_gen/audio_gen_gtzan.py
src/audio_gen/audio_gen_ucf101.py

You can specify the argument for the generation

gen_per_class: number of generation per class
generate_method: class_prompt (class-guided), llm (llm-assisted)
model: audiogen or audioldm (musicgen for musics as well)

For using LLM, as this research was performed when Gemini was first launched, we notice there are some new things to configure in recent Gemini release.

Synthetic Audio Release

For now, we only release the audios from ESC50, due to the large size of synthetic audios.

[Dropbox Download Link]

Audio Training

We provided sample experiment code for audio training. You will need to run split, preprocess, and finally finetune scripts under experiments. You would need to download the SSAST-Base-Patch-400.pth from the SSAST repo.

About

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published