Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
AudioSetClassCounts.tsv		AudioSetClassCounts.tsv
AudioSetDownloadedCounts.tsv		AudioSetDownloadedCounts.tsv
AudioSetTop110ClassesSortedCounts.tsv		AudioSetTop110ClassesSortedCounts.tsv
README.md		README.md
audioset_download.py		audioset_download.py
audioset_scripts.py		audioset_scripts.py
audioset_strong_eval_downloaded.tsv		audioset_strong_eval_downloaded.tsv
audioset_strong_eval_top110classes.tsv		audioset_strong_eval_top110classes.tsv
audioset_strong_eval_top110classes_downloaded.tsv		audioset_strong_eval_top110classes_downloaded.tsv
audioset_strong_train_downloaded.tsv		audioset_strong_train_downloaded.tsv
audioset_strong_train_top110classes.tsv		audioset_strong_train_top110classes.tsv
audioset_strong_train_top110classes_downloaded.tsv		audioset_strong_train_top110classes_downloaded.tsv
eval_list.txt		eval_list.txt
selected_classes.txt		selected_classes.txt
selected_files_eval.txt		selected_files_eval.txt
selected_files_train.txt		selected_files_train.txt
train_list.txt		train_list.txt

Repository files navigation

Audioset Scripts

A collection of scripts to analyze, prepare for and download Google's Audioset.

`audioset_download.py`

This file contains a function to download, format and segment a given YouTube audio, as well as a function to process an entire list of files in a parallelized way.

The expected input is a file listing file segments to be downloaded line by line in the format YTID_STARTMS, where YTID is the YouTube-Id of the video and STARTMS is the start time (in ms), from which a 10s interval will be extracted (see train_list.txt and eval_list.txt).

This requires external packages yt-dlp and sox.

`audioset_scripts.py`

This file contains scripts to counts files, classes and events in the dataset, select top most occuring classes, filter the dataset by a list of files or classes, as well as make tables of counts for several cases.

These scripts assume the usage of Google's Audioset: Reformatted. The files of the dataset need to be placed into src/ folder.

The only external files that are needed are train_list.txt and eval_list.txt, which list the actually downloaded files.