Official repository for CVPRW 2024 HuMoGen paper "Exploring Text-to-Motion Generation with Human Preference"
assets/
: contains the generated motions from MotionGPT, PPO, and DPO models for the demo.
checkpoints/
:
MotionGPT-base/
- finetuned MotionGPT model.dpo/
- our finetuned DPO modelppo/
- our finetuned RLHF PPO modelrm
- our finetuned reward model
commands/
:
ppo_train.sh
- script for training PPO model with shared base model and separate value and policy headsppo_sep_critic_train.sh
- script for training PPO model with separate value and policy modelsdpo_train.sh
- script for training DPO modelrm_train
- script for training reward model
preference_data/
: preference dataset
MotionGPT/
: the MotionGPT codebase with the following changes
-
We changed the following files
- MotionGPT/test.py
- MotionGPT/mGPT/config.py
- MotionGPT/mGPT/utils/load_checkpoint.py
- MotionGPT/requirements.txt
- MotionGPT/mGPT/archs/mgpt_lm.py
-
We added the following files
- MotionGPT/configs/config_eval_during_training.yaml
- MotionGPT/generate_npy.py
- MotionGPT/generate_videos.py
src/
:
models/
- scripts for training and evaluationscripts/
- scripts for running experimentstrainer/
- scripts for training and evaluation
- Download the preference dataset from Baidu cloud (code 6gcq) or from Tsinghua Cloud and put it at
preference_data/
- Download our checkpoints from Baidu cloud (code 8ig7) and put them at
checkpoints/
- Download the HumanML3D dataset from https://github.com/EricGuo5513/HumanML3D, preprocess it according to their instructions, and put it under
MotionGPT/datasets/
- Set up environment according to MotionGPT setup instructions below:
conda activate mgpt
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia
cd MotionGPT
pip install -r requirements.txt
python -m spacy download en_core_web_sm
bash prepare/download_smpl_model.sh
bash prepare/prepare_t5.sh
bash prepare/download_t2m_evaluators.sh
bash prepare/download_pretrained_models.sh
- (optional) set up visualization dependencies. Please refer to MotionGPT for set up instructions.
To train the reward model, modify the hyperparameters and paths in src/scripts/rm_train.sh
and run the following command:
bash src/scripts/rm_train.sh
To train the PPO model with shared base model and separate value and policy heads, modify the hyperparameters and paths in src/scripts/ppo_train.sh
and run the following command (does not support PEFT):
bash src/scripts/ppo_train.sh
To train the PPO model with separate value and policy models, modify the hyperparameters and paths in src/scripts/ppo_sep_critic_train.sh
and run the following command (does not support PEFT):
bash src/scripts/ppo_sep_critic_train.sh
To train the DPO model, modify the hyperparameters and paths in src/scripts/dpo_train.sh
and run the following command (does not support distributed training):
bash src/scripts/dpo_train.sh
cd
into MotionGPT first.
To evaluate without peft:
python test.py --cfg configs/config_h3d_stage3.yaml --task t2m --checkpoint /path/to/trained_model.pt
To evaluate with peft:
python test.py --cfg configs/config_h3d_stage3.yaml --task t2m --checkpoint /path/to/trained_model.pt --peft --r 8 --lora_alpha 16 --lora_dropout 0.05
To generate npy files for visualization:
python generate_npy.py --cfg configs/config_h3d_stage3.yaml --task t2m --checkpoint /path/to/trained_model.pt --peft --r 8 --lora_alpha 16 --lora_dropout 0.05
To generate videos for visualization:
python generate_videos.py --data_dir /path/to/generated_npys --video_dir /path/to/generated_videos
We provide a demo of the generated motions from MotionGPT, PPO, and DPO models (with temperature 1.0). The following table shows the generated motions for the given text instructions.
Text Instruction | MotionGPT Generated Motion | RLHF Generated Motion | DPO Generated Motion |
---|---|---|---|
"the individual is shaking their head from side to side" | motiongpt_468.mp4 |
ppo_468.mp4 |
dpo_468.mp4 |
"someone leaps off a concrete block" | motiongpt_3751.mp4 |
ppo_3751.mp4 |
dpo_3751.mp4 |
"a person lifts their arms, widens the space between their legs, and joins their hands together" | motiongpt_4710.mp4 |
ppo_4710.mp4 |
dpo_4710.mp4 |
"he moves his feet back and forth while dancing" | motiongpt_5018.mp4 |
ppo_5018.mp4 |
dpo_5018.mp4 |
"move the body vigorously and then plop down on the ground" | motiongpt_6050.mp4 |
ppo_6050.mp4 |
dpo_6050.mp4 |
If you find this code useful, please consider citing our paper:
@misc{sheng2024exploring,
title={Exploring Text-to-Motion Generation with Human Preference},
author={Jenny Sheng and Matthieu Lin and Andrew Zhao and Kevin Pruvost and Yu-Hui Wen and Yangguang Li and Gao Huang and Yong-Jin Liu},
year={2024},
eprint={2404.09445},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Thank you to MotionGPT authors for providing the codebase and the finetuned model. Our code is partially borrowing from them.
This code is distributed under an MIT LICENSE.
Note that our code depends on other libraries, including SMPL, SMPL-X, PyTorch3D, and uses datasets which each have their own respective licenses that must also be followed.