Skip to content

kj4483/TVSM-dataset

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TVSM Dataset

The TV Speech and Music (TVSM) dataset contains speech and music activity labels across a variety of TV shows and their corresponding audio features extracted from professionally-produced high-quality audio. The dataset aims to facilitate research on speech and music detection tasks.

Get the dataset

  • The dataset can be downloaded via Zenodo.org.
  • The paper can be downloaded via EURASIP open access.
  • This repo contains materials and codebase to reproduce the baseline experiment in the paper.

License and attribution

@ARTICLE{Hung2022,
  title={A Large TV Dataset for Speech and Music Activity Detection},
  author={Hung, Yun-Ning and Wu, Chih-Wei and Orife, Iroro and Hipple, Aaron and Wolcott, William and Lerch, Alexander},
  journal={EURASIP Journal on Audio, Speech, and Music Processing},
  volume={2022},
  number={1},
  pages={21},
  year={2022},
  publisher={Springer}
}

The TVSM dataset is licensed under a Apache License 2.0 license

Dataset introduction

The downloaded dataset has the following structure:

└─── READEME.txt
└─── TVSM-cuesheet/
│    └─── labels/
│    └─── mel_features/
│    └─── mfcc/
│    └─── vgg_features/
│    └─── TVSM-xxxx_metadata.csv
└─── TVSM-pseudo/
└─── TVSM-test/
  • READEME.txt: basic information about the dataset
  • TVSM-cuesheet/: smaller subset used for training. The labels are derived from cuesheet information
  • TVSM-pseudo/: larger subset used for training. The labels are labeled from a pre-trained model trained on TVSM-cuesheet
  • TVSM-test/: subset for testing. The labels are labeled by human annotators

Each subset folder has the same structure:

  • labels/: speech and music activation labels for each sample. Each row in a csv file represents "start time", "end time" and "s(speech)/m(music)"
  • mel_features/: the Mel spectrogram feature extracted from the audio of each sample
  • mfcc/: the MFCCs feature extracted from the audio of each sample
  • vgg_features/: the VGGish feature extracted from the audio of each sample
  • TVSM-xxxx_metadata.csv: the metadata of each sample

For more information, please visit our paper

Codebase introduction

Inference Code

Thanks @owlwang for the contribution! The easy-to-use inference code is now included in inference/

cd inference
python3 inference.py --audio_path test.wav --output_dir output/ --format csv/csv_prob

Old inference code

Interested in inferencing existing samples? Please visit predictor.py for usage.

cd training_code
python3 predictor.py --audio_path test.wav

Please install git lfs first then run git-lfs pull to restore the checkpoints

Please replace line 31 in SM_detector.py with self.save_hyperparameters(hparams) if you are using newer pytorch_lightning versions.

└─── Evaluation_Output/
│    └─── AVASpeech/
│    │    └─── T2
│    │    └─── TVSM-cuesheet
│    │    └─── TVSM-pseudo
│    └─── ...
└─── Models/
└─── training_code/
  • Evaluation_Output: the output generated by three models across five evaluation sets
    • T2: baseline method
    • TVSM-cuesheet: CRNN-P-Cue method
    • TVSM-pseudo: CRNN-P-Pseu method
  • Models: the pre-trained checkpoint from CRNN-P-Cue and CRNN-P-Pseu methods
  • training_code: code for training the model

Bug Fix

If you encounter error "batch response: This repository is over its data quota. Account responsible for LFS...", can download the model checkpoint from Google Drive

Contact

Please feel free to contact [email protected] or open an issue here if you have any questions about the dataset or the support code.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%