Skip to content

[NeurIPS 2024 Audio Imagination Workshop] Official implementation of the paper: 3D Audio-Visual Segmentation

Notifications You must be signed in to change notification settings

githubartema/3D-Audio-Visual-Segmentation

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

3D Audio-Visual Segmentation

Artem Sokolov, Swapnil Bhosale, Xiatian Zhu

NeurIPS 2024 Workshop on Audio Imagination

Project page arXiv Dataset

teaser

This repository is the official implementation of "3D Audio-Visual Segmentation". In this paper, we introduce a novel research problem, 3D Audio-Visual Segmentation, extending the existing AVS to the 3D output space. To facilitate this research, we create the very first simulation based benchmark, 3DAVS-S34-O7, providing photorealistic 3D scene environments with grounded spatial audio under single-instance and multi-instance settings, across 34 scenes and 7 object categories. Subsequently, we propose a new approach, EchoSegnet, characterized by integrating the ready-to-use knowledge from pretrained 2D audio-visual foundation models synergistically with 3D visual scene representation through spatial audio-aware mask alignment and refinement.

Updates

  • Data & Code coming soon!

Method: EchoSegnet

teaser

Citation

If you find our project useful, please use the following BibTeX entry:

@inproceedings{sokolov20243daudiovisualsegmentation,
    title     = {3D Audio-Visual Segmentation},
    author    = {Sokolov, Artem and Bhosale, Swapnil and Zhu, Xiatian},
    booktitle = {Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation},
    year      = {2024}
}

Contact

For feedback or questions please contact Artem Sokolov

About

[NeurIPS 2024 Audio Imagination Workshop] Official implementation of the paper: 3D Audio-Visual Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published