You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An abstract Dataset class is defined for easy batch processing. The base class is a direct copy of the Dataset class in PyTorch, with a few additional requirements:
Audio file paths must be saved in a list all_files. No other information (especially memory-intensive ones such as audio waveforms) should be stored.
Audio waveforms are read from paths only on indexing the dataset.
Other Datasets
Apart from the generic Dataset class, some common speech-related datasets subclass Dataset:
Speech recognition dataset (ASRDataset) holds transcripts and a transcript-to-label map that transforms a string sequence to a integer sequence.
Currently supporting WSJ0 (ASRWSJ0), WSJ1 (ASRWSJ1)
Speech enhancement dataset (SEDataset) holds a sequence of degraded-and-clean-speech pair.
Currently supporting VCTK (SEVCTKNoRev, SEVCTK2chan) and RATS (SERATS_SAD)
Speech activity detection dataset (SADDataset) holds time-stamps of speech-active regions for each speech file.
Will be added later
Overview
See audlib.data.dataset for the abstract interfaces of Dataset and its subclasses. For implementations of specific datasets, see the Wall Street Journal (WSJ) module in audlib.data.wsj, or other dataset modules.
The text was updated successfully, but these errors were encountered:
raymondxyy
changed the title
Separate normal dataset from asr dataset
ASRDataset subclasses Dataset
Dec 15, 2018
Dataset
Updated: 12/27/2018, 11:47 PM
Generic Dataset
An abstract
Dataset
class is defined for easy batch processing. The base class is a direct copy of theDataset
class in PyTorch, with a few additional requirements:all_files
. No other information (especially memory-intensive ones such as audio waveforms) should be stored.Other Datasets
Apart from the generic
Dataset
class, some common speech-related datasets subclassDataset
:ASRDataset
) holds transcripts and a transcript-to-label map that transforms a string sequence to a integer sequence.ASRWSJ0
), WSJ1 (ASRWSJ1
)SEDataset
) holds a sequence of degraded-and-clean-speech pair.SEVCTKNoRev
,SEVCTK2chan
) and RATS (SERATS_SAD
)SADDataset
) holds time-stamps of speech-active regions for each speech file.Overview
See
audlib.data.dataset
for the abstract interfaces ofDataset
and its subclasses. For implementations of specific datasets, see the Wall Street Journal (WSJ) module inaudlib.data.wsj
, or other dataset modules.The text was updated successfully, but these errors were encountered: