SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

Di Qiu Zhengcong Fei Rui Wang Jialin Bai Changqian Yu

Mingyuan Fan Guibin Chen Xiang Wen

Skywork AI

🔥 For more results, visit our homepage 🔥

👋 Join our Discord

This repo, named SkyReels-A1, contains the official PyTorch implementation of our paper SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers.

🔥🔥🔥 News!!

Mar 4, 2025: 🔥 We release audio-driven portrait image animation pipeline. Try out on Huggingface Spaces Demo !
Feb 18, 2025: 👋 We release the inference code and model weights of SkyReels-A1. Download
Feb 18, 2025: 🎉 We have made our technical report available as open source. Read
Feb 18, 2025: 🔥 Our online demo of LipSync is available on SkyReels now! Try out on LipSync .
Feb 18, 2025: 🔥 We have open-sourced I2V video generation model SkyReels-V1. This is the first and most advanced open-source human-centric video foundation model.

📑 TODO List

Getting Started 🏁

1. Clone the code and prepare the environment 🛠️

First git clone the repository with code:

git clone https://github.com/SkyworkAI/SkyReels-A1.git
cd SkyReels-A1

# create env using conda
conda create -n skyreels-a1 python=3.10
conda activate skyreels-a1

Then, install the remaining dependencies:

pip install -r requirements.txt

2. Download pretrained weights 📥

You can download the pretrained weights is from HuggingFace:

# !pip install -U "huggingface_hub[cli]"
huggingface-cli download SkyReels-A1 --local-dir local_path --exclude "*.git*" "README.md" "docs"

The FLAME, mediapipe, and smirk models are located in the SkyReels-A1/extra_models folder.

The directory structure of our SkyReels-A1 code is formulated as:

pretrained_models
├── FLAME
├── SkyReels-A1-5B
│   ├── pose_guider
│   ├── scheduler
│   ├── tokenizer
│   ├── siglip-so400m-patch14-384
│   ├── transformer
│   ├── vae
│   └── text_encoder
├── mediapipe
└── smirk

Download DiffposeTalk assets and pretrained weights (For Audio-driven)

We use diffposetalk to generate flame coefficients from audio, thereby constructing motion signals.
Download the diffposetalk code and follow its README to download the weights and related data.
Then place them in the specified directory.

cp -r ${diffposetalk_root}/style pretrained_models/diffposetalk
cp ${diffposetalk_root}/experiments/DPT/head-SA-hubert-WM/checkpoints/iter_0110000.pt pretrained_models/diffposetalk
cp ${diffposetalk_root}/datasets/HDTF_TFHP/lmdb/stats_train.npz pretrained_models/diffposetalk

pretrained_models
├── FLAME
├── SkyReels-A1-5B
├── mediapipe
├── diffposetalk
│   ├── style
│   ├── iter_0110000.pt
│   ├── stats_train.npz
└── smirk

3. Inference 🚀

You can simply run the inference scripts as:

python inference.py

# inference audio to video
python inference_audio.py

If the script runs successfully, you will get an output mp4 file. This file includes the following results: driving video, input image or video, and generated result.

Gradio Interface 🤗

We provide a Gradio interface for a better experience, just run by:

python app.py

The graphical interactive interface is shown as below:

Metric Evaluation 👓

We also provide all scripts for automatically calculating the metrics, including SimFace, FID, and L1 distance between expression and motion, reported in the paper.

All codes can be found in the eval folder. After setting the video result path, run the following commands in sequence:

python arc_score.py
python expression_score.py
python pose_score.py

Acknowledgements 💐

We would like to thank the contributors of CogvideoX, finetrainers and DiffPoseTalkrepositories, for their open research and contributions.

Citation 💖

If you find SkyReels-A1 useful for your research, welcome to 🌟 this repo and cite our work using the following BibTeX:

@article{qiu2025skyreels,
  title={SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers},
  author={Qiu, Di and Fei, Zhengcong and Wang, Rui and Bai, Jialin and Yu, Changqian and Fan, Mingyuan and Chen, Guibin and Wen, Xiang},
  journal={arXiv preprint arXiv:2502.10841},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

🔥🔥🔥 News!!

📑 TODO List

Getting Started 🏁

1. Clone the code and prepare the environment 🛠️

2. Download pretrained weights 📥

Download DiffposeTalk assets and pretrained weights (For Audio-driven)

3. Inference 🚀

Gradio Interface 🤗

Metric Evaluation 👓

Acknowledgements 💐

Citation 💖

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
diffposetalk		diffposetalk
eval		eval
scripts		scripts
skyreels_a1		skyreels_a1
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
inference.py		inference.py
inference_audio.py		inference_audio.py
requirements.txt		requirements.txt

License

SkyworkAI/SkyReels-A1

Folders and files

Latest commit

History

Repository files navigation

SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers

🔥🔥🔥 News!!

📑 TODO List

Getting Started 🏁

1. Clone the code and prepare the environment 🛠️

2. Download pretrained weights 📥

Download DiffposeTalk assets and pretrained weights (For Audio-driven)

3. Inference 🚀

Gradio Interface 🤗

Metric Evaluation 👓

Acknowledgements 💐

Citation 💖

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages