ATOSE: Audio Tagging with One-Side Joint Embedding

This is the repository for the method presented in the paper: "ATOSE: Audio Tagging with One-Side Joint Embedding" by J. Lee, D. Moon, J. Kim and M. Cho. Our model is carefully designed and architected to recognize the semantic information within the tag domains. In our experiments using the MagnaTagATune (MTAT) dataset, which has high inter-tag correlations, and the Speech Commands dataset, which has no inter-tag correlations, we showed that our approach improves the performance of existing models when there are strong inter-tag correlations.

Tag Autoencoder : Module for extracting tag domain features from tags
Feature Extractor : Module for extracting audio domain features from source data. Our joint embedding technique utilizes feature extractors used in conventional tagging models as a general approach applicable to other models that already exist. For more readable feature extractor, please check this repository
Projector : Module for mapping features of a audio domain to embedded vectors projected into the tag domain.
Classifier : Module for classifying features in the extracted music domain into tags using a pre-trained feature extractor in stage 1.

Usage

Preparing Dataset

MTAT : link
DCASE2017-task4 : link
Speech Command : link

Installation


conda env create -n $ENV_NAME -- file environment.yaml
conda activate $ENV_NAME

Preprocessing


cd preprocessing/$DATASET
python -u preprocess.py run $DATASET_PATH
python -u split.py run $DATASET_PATH

Training


cd training
python main.py

Options


# If you want to use the hyperparameter in paper, refer to the contents of 'train_model.sh'
'--gpu'            # GPU to be used
'--data_path'      # Path of datasets 
'--dataset'        # Types of datasets to learn, choose among 'mtat', 'dcase', and 'keyword'
'--batch_size'     # batch size
'--isTest'         # Check if the model is working
'--encoder_type    # Types of feature extractor, choose among 'HC'(HarmonicCNN), 'MS'(TagSincNet), and 'SC' (SampleCNN)
'--block'          # Block types of SampleCNN, choose among 'basic', 'se', 'res', and 'rese'
'--latent'         # Dimensions of latent vectors to be joint embedded
'--withJE'         # Options for deciding to apply joint embedding

Code Style

I follow PEP-8 for code style. Especially the style of docstrings is important to generate documentation.

Author

Jaehwan Lee @jaehwlee
Contacts: jaehwlee@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

ATOSE: Audio Tagging with One-Side Joint Embedding

Usage

Code Style

Author

Files

README.md

Latest commit

History

README.md

File metadata and controls

ATOSE: Audio Tagging with One-Side Joint Embedding

Usage

Code Style

Author