AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Yasheng Sun, Wenqing Chu, Zhiliang Xu, Dongliang He, Hideki Koike

Paper

Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation.

Table of Content

News
Installation
Pretrained Model
Testing
Training
License
Acknowledgements

News

[2024/02]: Paper is on Arxiv.
[2024/04]: Paper is accepted by IEEE Access.

Step-by-step Installation Instructions

a. Create a conda virtual environment and activate it. It requires python >= 3.8 as base environment.

conda create -n sssp python=3.8 -y
conda activate sssp

b. Install PyTorch and torchvision following the official instructions.

conda install pytorch==1.9.0 torchvision==0.10.0 -c pytorch -c conda-forge

b. Install other dependencies. We simply freeze our environments. Other environments might also works. Here we provide requirements.txt file for reference.

pip install -r requirements.txt

Pretrained Model

Download the pre-trained model and put it to train_logs/ accordingly.

Instructions for Testing the Model

Once the pre-trained model is prepared, you can test the model by running the following command:

bash experiments/diffusion_test.sh align_emote

Instructions for Training the Model

If you are interested in training the model by yourself, please set up the environments accordingly and run the below commands.

bash experiments/diffusion_train.sh align_emote

Acknowledgements

Many thanks to these excellent open source projects:

[DALLE2-pytorch] (https://github.com/lucidrains/DALLE2-pytorch)
[INFERNO] (https://github.com/radekd91/inferno)
[PIRender] (https://github.com/RenYurui/PIRender)
[PD-FGC-inference] (https://github.com/Dorniwang/PD-FGC-inference)

Citation

If you find our paper and code useful for your research, please consider citing:

@article{sun2024avi,
  title={AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation},
  author={Sun, Yasheng and Chu, Wenqing and Zhou, Hang and Wang, Kaisiyuan and Koike, Hideki},
  journal={IEEE Access},
  year={2024},
  publisher={IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
BlendshapeVisualizer		BlendshapeVisualizer
config		config
dataset		dataset
experiments		experiments
misc		misc
models		models
scripts		scripts
third_party		third_party
visualize		visualize
README.md		README.md
emoca_utils.py		emoca_utils.py
loop_utils.py		loop_utils.py
requirements.txt		requirements.txt
train_diffusion_prior.py		train_diffusion_prior.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Paper

Table of Content

News

Step-by-step Installation Instructions

Pretrained Model

Instructions for Testing the Model

Instructions for Training the Model

Acknowledgements

Citation

About

Releases

Packages

Languages

sunyasheng/AVI-Talking

Folders and files

Latest commit

History

Repository files navigation

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Paper

Table of Content

News

Step-by-step Installation Instructions

Pretrained Model

Instructions for Testing the Model

Instructions for Training the Model

Acknowledgements

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages