This repository implements the model proposed in the paper:
Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po, AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
The implementation code is based on the Slow-Fast Auditory Streams for Audio Recognition, ICASSP, 2021. For more information, please refer to the link.
When using this code, kindly reference:
@article{lau2024audiorepinceptionnext,
title={AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition},
author={Lau, Kin Wai and Rehman, Yasar Abbas Ur and Po, Lai-Man},
journal={Neurocomputing},
pages={127432},
year={2024},
publisher={Elsevier}
}
You can download our pretrained models as follow:
- AudioRepInceptionNeXt (VGG-Sound) link
- AudioRepInceptionNeXt (EPIC-Sound) link
- AudioRepInceptionNeXt (EPIC-Kitchens-100) link
- AudioRepInceptionNeXt (Speech Commands V2) link
- AudioRepInceptionNeXt (Urban Sound 8K) link
- AudioRepInceptionNeXt (NSynth) link
- Requirements:
- Add this repository to $PYTHONPATH.
export PYTHONPATH=/path/to/AudioRepInceptionNeXt:$PYTHONPATH
- VGG-Sound: See the instruction in Auditory Slow-Fast repository link
- EPIC-KITCHENS: See the instruction in Auditory Slow-Fast repository link
- EPIC-Sounds See the instruction in Epic-Sounds annotations repository link and link
-
VGG-Sound: URL of the dataset link
-
EPIC-KITCHENS: URL of the dataset link
-
EPIC-Sounds: URL of the dataset link
-
Speech Commands V2: URL of the dataset link
-
Urban Sound 8K: URL of the dataset link
-
NSynth: URL of the dataset link
To train the model run (see run_train.sh as an example):
python tools/run_net.py --cfg configs/VGG-Sound/AudioRepInceptionNeXt.yaml --init_method tcp://localhost:9996 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/output_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations
To validate the trained model run (see run_eval.sh as an example):
python tools/run_net.py --cfg configs/VGG-Sound/AudioRepInceptionNeXt.yaml --init_method tcp://localhost:9998 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset \
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth
To export the reparametrized AudioRepInceptionNeXt run (see run_eval.sh as an example):
python tools/run_net.py --cfg configs/VGG-Sound/AudioRepInceptionNeXt.yaml --init_method tcp://localhost:9998 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset \
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
MODEL.MERGE_MODE True \
MODEL.OUTPUT_DIR /path/to/new_model_saving_dir \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth
To run the reparametrized AudioRepInceptionNeXt in inference mode run (see run_eval_inference.sh as an example):
python tools/run_net.py --cfg configs/VGG-Sound/AudioRepInceptionNeXt_Inference.yaml --init_method tcp://localhost:9998 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
VGGSOUND.AUDIO_DATA_DIR /path/to/dataset \
VGGSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/new_model_saving_dir/checkpoints/checkpoint_best.pyth
To fine-tuning from VGG-Sound pretrained model (see run_train.sh as an example):
python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioRepInceptionNeXt.yaml --init_method tcp://localhost:9996 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/output_dir \
EPICSOUND.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 \
EPICSOUND.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.CHECKPOINT_FILE_PATH /path/to/VGG-Sound/pretrained/model
To validate the model run (see run_eval.sh as an example)::
python tools/run_net.py --cfg configs/EPIC-SOUND-416x128/AudioRepInceptionNeXt.yaml --init_method tcp://localhost:9997 \
NUM_GPUS num_gpus \
OUTPUT_DIR /path/to/experiment_dir \
EPICKITCHENS.AUDIO_DATA_FILE /path/to/EPIC-KITCHENS-100_audio.hdf5 \
EPICKITCHENS.ANNOTATIONS_DIR /path/to/annotations \
TRAIN.ENABLE False \
TEST.ENABLE True \
TEST.CHECKPOINT_FILE_PATH /path/to/experiment_dir/checkpoints/checkpoint_best.pyth