Skip to content

SoberDanz/Awesome-Talking-Face-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Awesome-Talking-Face-Generation

This is a repository for organizing papers and codes about Talking-Face-Generation(TFG) for computer vision.

Besides,the commonly-used datasets and metrics for TFG are also introduced.

💫 This project is constantly being updated,any suggestions are welcomed!

Papers

2024

Title Venue Dataset PDF CODE
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions arXiv 2024 HDTF PDF CODE
EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model ICASSP 2024 MEAD&CREMA-D PDF CODE
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis ICLR 2024 CelebV-HQ&VoxCeleb2 PDF -
G4G:A Generic Framework for High Fidelity Talking Face Generation with Fine-grained Intra-modal Alignment arXiv 2024 HDTF&LRS2 PDF -
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis CVPR 2024 - PDF CODE
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis CVPR2024 - PDF CODE

2023

Title Venue Dataset PDF CODE
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models arXiv 2023 MEAD & HDTF & Voxceleb2 PDF -
GMTalker: Gaussian Mixture based Emotional talking video Portraits arXiv 2023 MEAD & LSP PDF -
DiT-Head: High-Resolution Talking Head Synthesis using Diffusion Transformers arXiv 2023 HDTF PDF -
R2-Talker: Realistic Real-Time Talking Head Synthesis with Hash Grid Landmarks Encoding and Progressive Multilayer Conditioning arXiv 2023 - PDF -
FT2TF: First-Person Statement Text-To-Talking Face Generation arXiv 2023 LRS2 & LRS3 PDF -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior arXiv 2023 HDTF & VoxCeleb PDF -
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesi arXiv 2023 LRS3 PDF -
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis ICLR 2023 LRS3 PDF CODE
GAIA: Zero-shot Talking Avatar Generation arXiv 2023 dataset from diverse sources PDF -
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis ICCV 2023 - PDF CODE
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head Video Generation ICCV 2023 VoxCeleb1 & CelebV PDF CODE
MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions ICCV 2023 HDTF & LSP PDF -
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation ICCV 2023 Celeb2 & MEAD & LRW & MEAD PDF CODE
EMMN: Emotional Motion Memory Network for Audio-driven Emotional Talking Face Generation ICCV 2023 MEAD & LRW PDF -
Emotional Listener Portrait: Realistic Listener Motion Simulation in Conversation ICCV 2023 ViCo and the dataset proposed by Learning2Listen PDF -
MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation CVPR 2023 VoxCeleb2 & HDTF PDF CODE
Implicit Neural Head Synthesis via Controllable Local Deformation Fields CVPR 2023 - PDF -
LipFormer: High-fidelity and Generalizable Talking Face Generation with A Pre-learned Facial Codebook CVPR 2023 LRS2 & FFHQ PDF -
GANHead: Towards Generative Animatable Neural Head Avatars CVPR 2023 FaceVerse-Dataset PDF CODE
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment CVPR 2023 HDTF & Testset1 & Testset 2 PDF -
Identity-Preserving Talking Face Generation with Landmark and Appearance Priors CVPR 2023 LRS2 & LRS3 PDF CODE
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator CVPR 2023 LRW & VoxCeleb PDF -
High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning CVPR 2023 MEAD PDF -
Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert CVPR 2023 LRS2 & LRW PDF CODE
OTAvatar : One-shot Talking Face Avatar with Controllable Tri-plane Rendering CVPR 2023 HDTF & e Multiface PDF CODE
Style Transfer for 2D Talking Head Animation arXiv 2023 VoxCeleb2 PDF -
StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles AAAI 2023 MEAD & HDTF PDF CODE

2022

Title Venue Dataset PDF CODE
SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory AAAI 2022 LRW & LRS2 & BBC News PDF -
Progressive Disentangled Representation Learning for Fine-Grained Controllable Talking Head Synthesis CVPR 2022 VoxCeleb2 & Mead PDF -
Compressing Video Calls using Synthetic Talking Heads BMVC 2022 - PDF -
Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement arXiv 2022 - PDF -
StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation arXiv 2022 Voxceleb2 PDF -
Talking Head from Speech Audio using a Pre-trained Image Generato ACM MM 2022 TCD-TIMIT & GRID PDF -
Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis ECCV 2022 - PDF CODE
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation ECCV 2022 - PDF CODE
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary ICASSP 2022 VidTIMIT PDF CODE
Emotion-Controllable Generalized Talking Face Generation IJCAI 2022 MEAD & CREMA-D & RAVDESS PDF -
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning CVPR 2022 Shapes & MUG & iPER & Multimodal VoxCeleb PDF CODE
Depth-Aware Generative Adversarial Network for Talking Head Video Generation CVPR 2022 VoxCeleb1 & CelebV PDF CODE
Expressive Talking Head Generation with Granular Audio-Visual Control CVPR 2022 Voxceleb2 & MEAD PDF -

2021

Title Venue Dataset PDF CODE
Audio-Driven Emotional Video Portraits CVPR 2021 MEAD & LRW PDF CODE
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation CVPR 2021 VoxCeleb2& LRW PDF CODE
Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset CVPR 2021 HDTF PDF CODE
Write-a-speaker: Text-based Emotional and Rhythmic Talking-head Generation AAAI 2021 Mocap dataset PDF -
Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion IJCAI 2021 VoxCeleb & GRID & LRW PDF CODE
Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis ACMMM 2021 Ted-HD & LRW PDF CODE
AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis ICCV 2021 - PDF CODE
FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning ICCV 2021 - PDF CODE
Learned Spatial Representations for Few-shot Talking-Head Synthesis ICCV 2021 VoxCeleb PDF -
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing CVPR 2021 VoxCeleb2 & TalkingHead-1KH PDF -
Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary arXiv 2021 VidTIMIT PDF CODE

2020

Title Venue Datasets PDF CODE
Realistic Face Reenactment via Self-Supervised Disentangling of Identity and Pose AAAI 2020 VoxCeleb PDF -
Robust One Shot Audio to Video Generation CVPR 2020 GRID & LOMBARD GRID PDF -
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis CVPR 2020 GRID & TCD-TIMIT PDF -
Neural Voice Puppetry: Audio-driven Facial Reenactment ECCV 2020 - PDF CODE
Talking-head Generation with Rhythmic Head Motion ECCV 2020 Crema & Grid & Voxceleb & Lrs3 PDF -
A Neural Lip-Sync Framework for Synthesizing Photorealistic Virtual News Anchors ICPR 2020 - PDF -
Talking Face Generation with Expression-Tailored Generative Adversarial Network ACMMM 2020 - PDF -
A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild ACMMM 2020 LRS2 PDF CODE

Before 2020

Title Venue Datasets PDF CODE
Few-Shot Adversarial Learning of Realistic Neural Talking Head Model ICCV 2019 VoxCeleb PDF CODE
Hierarchical Cross-Modal Talking Face Generation with Dynamic Pixel-Wise Loss CVPR 2019 LRW & GRID PDF CODE
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation AAAI 2019 LRW PDF CODE
Realistic Speech-Driven Facial Animation with GANs IJCV 2019 GRID & TCD-TIMIT & CREMA-D &LRW PDF -
Talking Face Generation by Conditional Recurrent Adversarial Network IJCAI 2019 TCD-TIMIT & LRW & VoxCeleb PDF CODE
Lip Movements Generation at a Glance ECCV 2018 GRID &LRW &LDC PDF -
You said that? BMVC 2017 VoxCeleb & LRW PDF -

Datasets

LRS2

LRW

GRID

MEAD

VoxCeleb

HDTF

SAVEE

VOCA

CREMA-D

Metrics

  • PSNR (Peak Signal-to-Noise Ratio) : Measures the signal-to-noise ratio between the generated image and the original image, often used for comparing the similarity between two images. Higher PSNR values indicate better image quality.
  • SSIM (Structural Similarity Index) : Evaluates the structural similarity between the generated image and the original image, considering brightness, contrast, and structure. SSIM values range from [-1, 1], with values closer to 1 indicating better image quality.
  • LMD (Log-Mel Filterbank Distance) : Measures the Mel filterbank distance between the generated speech and the target speech. Lower LMD values indicate better speech generation quality.
  • LRA (lip-reading accuracy) : used in evaluating speech generation quality, but focusing on the ratio of Mel filterbanks.
  • FID (Fréchet inception distance) : Measures the quality of generated images by comparing the feature statistics of generated images to real images. Lower FID values indicate higher similarity between the distributions of generated and real images.
  • LSE-D (Lip Sync Error - Distance) : Measures the error between the spectrogram of the generated speech and the real speech.
  • LSE-C (Lip Sync Error - Confidence) :Similar to LSE-D, but considers a classifier model for measuring the error between the spectrogram of generated speech and real speech.
  • LPIPS (Learned Perceptual Image Patch Similarity) : Utilizes a deep learning model to learn perceptual image quality, considering human perception of local image structures. Lower LPIPS values indicate better image generation quality.
  • NIQE (Natural Image Quality Evaluator) : Used to evaluate the naturalness and quality of images, considering natural statistical properties. Lower NIQE values indicate better image quality.

Acknowledgements

This page was created by Dan Zhao, a graduate student at Dalian University of Technology.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published