Skip to content

[IEEE 2024] AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Notifications You must be signed in to change notification settings

sunyasheng/AVI-Talking

Repository files navigation

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Yasheng Sun, Wenqing Chu, Zhiliang Xu, Dongliang He, Hideki Koike

Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation.

Table of Content

News

Step-by-step Installation Instructions

a. Create a conda virtual environment and activate it. It requires python >= 3.8 as base environment.

conda create -n sssp python=3.8 -y
conda activate sssp

b. Install PyTorch and torchvision following the official instructions.

conda install pytorch==1.9.0 torchvision==0.10.0 -c pytorch -c conda-forge

b. Install other dependencies. We simply freeze our environments. Other environments might also works. Here we provide requirements.txt file for reference.

pip install -r requirements.txt

Pretrained Model

  • Download the pre-trained model and put it to train_logs/ accordingly.

Instructions for Testing the Model

Once the pre-trained model is prepared, you can test the model by running the following command:

bash experiments/diffusion_test.sh align_emote

Instructions for Training the Model

If you are interested in training the model by yourself, please set up the environments accordingly and run the below commands.

bash experiments/diffusion_train.sh align_emote

Acknowledgements

Many thanks to these excellent open source projects:

Citation

If you find our paper and code useful for your research, please consider citing:

@article{sun2024avi,
  title={AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation},
  author={Sun, Yasheng and Chu, Wenqing and Zhou, Hang and Wang, Kaisiyuan and Koike, Hideki},
  journal={IEEE Access},
  year={2024},
  publisher={IEEE}
}

About

[IEEE 2024] AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published