ICASSP 2024
See extract_feats.py for feature extraction examples. We currently support the following models:
- AV-HuBERT (ICLR 2022)
- RepLAI (NeurIPS 2022)
- Lee et al. (ICLR 2021), referred as AVBERT in this repo
- MAViL (NeurIPS 2023)
We also include handcrafted features to serve as baselines. Pull requests are welcome for adding more models.
Installation:
conda create -n av python=3.9 -y
conda activate av
pip install -r requirements.txt
Downstream Task Evaluation:
python run_downstream.py -m train \
-u <upstream model name> \
-d <downstream task name> \
-s <feature type> \
--pooled_features_path <path to save features>
Researchers can also submit model code and weights to our submission platform to easily evaluate on the AV-SUPERB benchmark.
We expect two Python files to be submitted, expert.py
, which implements the model forward pass and preprocessing functions for each of the two modalities, and hubconf.py
, which downloads model weights.
Please refer to this example model and the submission platform for more details.
@article{tseng2023avsuperb,
title={AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models},
author={Yuan Tseng and Layne Berry and Yi-Ting Chen and I-Hsiang Chiu and Hsuan-Hao Lin and Max Liu and Puyuan Peng and Yi-Jen Shih and Hung-Yu Wang and Haibin Wu and Po-Yao Huang and Chun-Mao Lai and Shang-Wen Li and David Harwath and Yu Tsao and Shinji Watanabe and Abdelrahman Mohamed and Chi-Luen Feng and Hung-yi Lee},
journal={arXiv preprint arXiv:2309.10787},
year={2023}
}
AV-SUPERB is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).
Using files and pretrained AV-HuBERT models under the upstream_models/vhubert
folder requires accepting the terms in the AV-HuBERT license agreement listed in this file.
See LICENSE-APACHE, LICENSE-MIT, COPYRIGHT for details.
Source code is based on S3PRL.