AV-SUPERB

Paper, Submission leaderboard

ICASSP 2024

Plug and play pretrained audio-visual models

See extract_feats.py for feature extraction examples. We currently support the following models:

AV-HuBERT (ICLR 2022)
RepLAI (NeurIPS 2022)
Lee et al. (ICLR 2021), referred as AVBERT in this repo
MAViL (NeurIPS 2023)

We also include handcrafted features to serve as baselines. Pull requests are welcome for adding more models.

Model Evaluation

1. Using our toolkit:

Installation:

conda create -n av python=3.9 -y
conda activate av
pip install -r requirements.txt

Downstream Task Evaluation:

python run_downstream.py -m train \
  -u <upstream model name> \
  -d <downstream task name> \
  -s <feature type> \
  --pooled_features_path <path to save features>

2. Using our submission platform:

Researchers can also submit model code and weights to our submission platform to easily evaluate on the AV-SUPERB benchmark.

We expect two Python files to be submitted, expert.py, which implements the model forward pass and preprocessing functions for each of the two modalities, and hubconf.py, which downloads model weights.

Please refer to this example model and the submission platform for more details.

Citation

@article{tseng2023avsuperb,
  title={AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models},
  author={Yuan Tseng and Layne Berry and Yi-Ting Chen and I-Hsiang Chiu and Hsuan-Hao Lin and Max Liu and Puyuan Peng and Yi-Jen Shih and Hung-Yu Wang and Haibin Wu and Po-Yao Huang and Chun-Mao Lai and Shang-Wen Li and David Harwath and Yu Tsao and Shinji Watanabe and Abdelrahman Mohamed and Chi-Luen Feng and Hung-yi Lee},
  journal={arXiv preprint arXiv:2309.10787},
  year={2023}
}

License

AV-SUPERB is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).

Using files and pretrained AV-HuBERT models under the upstream_models/vhubert folder requires accepting the terms in the AV-HuBERT license agreement listed in this file.

See LICENSE-APACHE, LICENSE-MIT, COPYRIGHT for details.

Acknowledgement

Source code is based on S3PRL.

Name		Name	Last commit message	Last commit date
Latest commit History 367 Commits
downstream_tasks		downstream_tasks
upstream_models		upstream_models
utils		utils
.gitignore		.gitignore
COPYRIGHT		COPYRIGHT
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
extract_feats.py		extract_feats.py
hub.py		hub.py
interfaces.py		interfaces.py
requirements.txt		requirements.txt
run_downstream.py		run_downstream.py
runner.py		runner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

AV-SUPERB

Plug and play pretrained audio-visual models

Model Evaluation

1. Using our toolkit:

2. Using our submission platform:

Citation

License

Acknowledgement

About

Licenses found

Releases 1

Packages

Contributors 8

Languages

License

Licenses found

roger-tseng/av-superb

Folders and files

Latest commit

History

Repository files navigation

AV-SUPERB

Plug and play pretrained audio-visual models

Model Evaluation

1. Using our toolkit:

2. Using our submission platform:

Citation

License

Acknowledgement

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 8

Languages

Packages