Awesome-Talking-Face-Generation

This is a repository for organizing papers and codes about Talking-Face-Generation(TFG) for computer vision.

Besides,the commonly-used datasets and metrics for TFG are also introduced.

💫 This project is constantly being updated,any suggestions are welcomed!

Papers

2024

Title	Venue	Dataset	PDF	CODE
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions	arXiv 2024	HDTF	PDF	CODE
EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model	ICASSP 2024	MEAD&CREMA-D	PDF	CODE
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis	ICLR 2024	CelebV-HQ&VoxCeleb2	PDF	-
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment	arXiv 2024	HDTF&LRS2	PDF	-
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis	CVPR 2024	-	PDF	CODE
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis	CVPR2024	-	PDF	CODE

2023

Title	Venue	Dataset	PDF	CODE
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models	arXiv 2023	MEAD & HDTF & Voxceleb2	PDF	-
GMTalker: Gaussian Mixture based Emotional talking video Portraits	arXiv 2023	MEAD & LSP	PDF	-
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers	arXiv 2023	HDTF	PDF	-
R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning	arXiv 2023	-	PDF	-
FT2TF: First-Person Statement Text-To-Talking Face Generation	arXiv 2023	LRS2 & LRS3	PDF	-
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior	arXiv 2023	HDTF & VoxCeleb	PDF	-
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesi	arXiv 2023	LRS3	PDF	-
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis	ICLR 2023	LRS3	PDF	CODE
GAIA: Zero-shot Talking Avatar Generation	arXiv 2023	dataset from diverse sources	PDF	-
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis	ICCV 2023	-	PDF	CODE
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation	ICCV 2023	VoxCeleb1 & CelebV	PDF	CODE
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions	ICCV 2023	HDTF & LSP	PDF	-
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation	ICCV 2023	Celeb2 & MEAD & LRW & MEAD	PDF	CODE
EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation	ICCV 2023	MEAD & LRW	PDF	-
Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation	ICCV 2023	ViCo and the dataset proposed by Learning2Listen	PDF	-
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation	CVPR 2023	VoxCeleb2 & HDTF	PDF	CODE
Implicit Neural Head Synthesis via Controllable Local Deformation Fields	CVPR 2023	-	PDF	-
LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook	CVPR 2023	LRS2 & FFHQ	PDF	-
GANHead: Towards Generative Animatable Neural Head Avatars	CVPR 2023	FaceVerse-Dataset	PDF	CODE
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment	CVPR 2023	HDTF & Testset1 & Testset 2	PDF	-
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors	CVPR 2023	LRS2 & LRS3	PDF	CODE
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator	CVPR 2023	LRW & VoxCeleb	PDF	-
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning	CVPR 2023	MEAD	PDF	-
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert	CVPR 2023	LRS2 & LRW	PDF	CODE
OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering	CVPR 2023	HDTF & e Multiface	PDF	CODE
Style Transfer for 2D Talking Head Animation	arXiv 2023	VoxCeleb2	PDF	-
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles	AAAI 2023	MEAD & HDTF	PDF	CODE

2022

Title	Venue	Dataset	PDF	CODE
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory	AAAI 2022	LRW & LRS2 & BBC News	PDF	-
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis	CVPR 2022	VoxCeleb2 & Mead	PDF	-
Compressing Video Calls using Synthetic Talking Heads	BMVC 2022	-	PDF	-
Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement	arXiv 2022	-	PDF	-
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation	arXiv 2022	Voxceleb2	PDF	-
Talking Head from Speech Audio using a Pre-trained Image Generato	ACM MM 2022	TCD-TIMIT & GRID	PDF	-
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis	ECCV 2022	-	PDF	CODE
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation	ECCV 2022	-	PDF	CODE
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary	ICASSP 2022	VidTIMIT	PDF	CODE
Emotion-Controllable Generalized Talking Face Generation	IJCAI 2022	MEAD & CREMA-D & RAVDESS	PDF	-
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning	CVPR 2022	Shapes & MUG & iPER & Multimodal VoxCeleb	PDF	CODE
Depth-Aware Generative Adversarial Network for Talking Head Video Generation	CVPR 2022	VoxCeleb1 & CelebV	PDF	CODE
Expressive Talking Head Generation with Granular Audio-Visual Control	CVPR 2022	Voxceleb2 & MEAD	PDF	-

2021

Title	Venue	Dataset	PDF	CODE
Audio-Driven Emotional Video Portraits	CVPR 2021	MEAD & LRW	PDF	CODE
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation	CVPR 2021	VoxCeleb2& LRW	PDF	CODE
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset	CVPR 2021	HDTF	PDF	CODE
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation	AAAI 2021	Mocap dataset	PDF	-
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion	IJCAI 2021	VoxCeleb & GRID & LRW	PDF	CODE
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis	ACMMM 2021	Ted-HD & LRW	PDF	CODE
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis	ICCV 2021	-	PDF	CODE
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning	ICCV 2021	-	PDF	CODE
Learned Spatial Representations for Few-shot Talking-Head Synthesis	ICCV 2021	VoxCeleb	PDF	-
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing	CVPR 2021	VoxCeleb2 & TalkingHead-1KH	PDF	-
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary	arXiv 2021	VidTIMIT	PDF	CODE

2020

Title	Venue	Datasets	PDF	CODE
Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose	AAAI 2020	VoxCeleb	PDF	-
Robust One Shot Audio to Video Generation	CVPR 2020	GRID & LOMBARD GRID	PDF	-
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis	CVPR 2020	GRID & TCD-TIMIT	PDF	-
Neural Voice Puppetry: Audio-driven Facial Reenactment	ECCV 2020	-	PDF	CODE
Talking-head Generation with Rhythmic Head Motion	ECCV 2020	Crema & Grid & Voxceleb & Lrs3	PDF	-
A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors	ICPR 2020	-	PDF	-
Talking Face Generation with Expression-Tailored Generative Adversarial Network	ACMMM 2020	-	PDF	-
A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild	ACMMM 2020	LRS2	PDF	CODE

Before 2020

Title	Venue	Datasets	PDF	CODE
Few-Shot Adversarial Learning of Realistic Neural Talking Head Model	ICCV 2019	VoxCeleb	PDF	CODE
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss	CVPR 2019	LRW & GRID	PDF	CODE
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation	AAAI 2019	LRW	PDF	CODE
Realistic Speech-Driven Facial Animation with GANs	IJCV 2019	GRID & TCD-TIMIT & CREMA-D &LRW	PDF	-
Talking Face Generation by Conditional Recurrent Adversarial Network	IJCAI 2019	TCD-TIMIT & LRW & VoxCeleb	PDF	CODE
Lip Movements Generation at a Glance	ECCV 2018	GRID &LRW &LDC	PDF	-
You said that?	BMVC 2017	VoxCeleb & LRW	PDF	-

Datasets

• LRS2

• LRW

• GRID

• MEAD

• VoxCeleb

• HDTF

• SAVEE

• VOCA

• CREMA-D

Metrics

PSNR (Peak Signal-to-Noise Ratio) : Measures the signal-to-noise ratio between the generated image and the original image, often used for comparing the similarity between two images. Higher PSNR values indicate better image quality.
SSIM (Structural Similarity Index) : Evaluates the structural similarity between the generated image and the original image, considering brightness, contrast, and structure. SSIM values range from [-1, 1], with values closer to 1 indicating better image quality.
LMD (Log-Mel Filterbank Distance) : Measures the Mel filterbank distance between the generated speech and the target speech. Lower LMD values indicate better speech generation quality.
LRA (lip-reading accuracy) : used in evaluating speech generation quality, but focusing on the ratio of Mel filterbanks.
FID (Fréchet inception distance) : Measures the quality of generated images by comparing the feature statistics of generated images to real images. Lower FID values indicate higher similarity between the distributions of generated and real images.
LSE-D (Lip Sync Error - Distance) : Measures the error between the spectrogram of the generated speech and the real speech.
LSE-C (Lip Sync Error - Confidence) :Similar to LSE-D, but considers a classifier model for measuring the error between the spectrogram of generated speech and real speech.
LPIPS (Learned Perceptual Image Patch Similarity) : Utilizes a deep learning model to learn perceptual image quality, considering human perception of local image structures. Lower LPIPS values indicate better image generation quality.
NIQE (Natural Image Quality Evaluator) : Used to evaluate the naturalness and quality of images, considering natural statistical properties. Lower NIQE values indicate better image quality.

Acknowledgements

This page was created by Dan Zhao, a graduate student at Dalian University of Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Talking-Face-Generation

Papers

2024

2023

2022

2021

2020

Before 2020

Datasets

Metrics

Acknowledgements

About

Releases

Packages

SoberDanz/Awesome-Talking-Face-Generation

Folders and files

Latest commit

History

Repository files navigation

Awesome-Talking-Face-Generation

Papers

2024

2023

2022

2021

2020

Before 2020

Datasets

Metrics

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages