This repository provides the official code for CVPR 2024 paper UniHuman: A Unified Model For Editing Human Images in the Wild. This work was conducted during Nannan Li's Summer 2023 internship at Adobe Research.
This repo provides the source code for UniHuman, a model that leverages multiple data sources and connections between related tasks to achieve high-quality results across various human image editing objectives. For users who would like to try our pretrained model, please set up the environment according to Section 2: Environment Requirements, and run the program as instructed by Section 4: Gradio Demo and Section 5: Model Inference. For those who are interested in our collected data, please refer to Section 3: Data Preparation.
Python 3.8
CUDA 11.6
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
pip install -r requirements.txt
# Install MMPose
mim install mmengine==0.9.0
mim install "mmcv==2.0.1"
mim install "mmdet==3.1.0"
pip install git+https://github.com/open-mmlab/mmpose.git@537bd8e543ab463fb55120d5caaa1ae22d6aaf06#egg=mmpose
# Install DensePose
wget https://github.com/facebookresearch/detectron2/archive/refs/tags/v0.6.zip
unzip v0.6.zip
cd detectron2-0.6
pip install .
pip install projects/DensePose
cd ..
# Install human parser
cd code
git clone https://github.com/Gaoyiminggithub/Graphonomy.git
cd ..
After installing the above packages, your directory structure should look like
assets
code
βββ Graphonomy
βββ [other files and folders]
detectron2-0.6
requirements.txt
...
We provide the image links of our dataset and their annotations in the zip files. See below for detailed instructions. Please note that we do not own the copyright of the images. It is solely your responsibility to check the original licenses of the images before using them. Any use of the images is at your own discretion and risk.
Dataset | No. of Train Images | No. of Test Pairs |
---|---|---|
WPose | N/A | 2304 |
WVTON | N/A | 440 |
LH-400K | 409,270 | N/A |
3.1.1 Reposing Dataset: WPose
- Download the annotations wpose.zip and unzip it under the current directory.
- Download the images using the links given in
wpose/image_urls.txt
and put them under./downloaded_data
. For your convenience, we provide a download scriptget_wpose_data.py
. Running it will download and preprocess the raw images, but it is solely your responsibility to check the original licenses of the images before using them. Running the code will save the preprocessed images to./wpose/images
. In our preprocessing pipeline, we crop a rectangle centering the subject in the image and resize its longer side to 1024 pixels.
After the above steps, your directory structure should look like
downloaded_data
wpose
βββ README
βββ image_urls.txt
βββ bbox.txt
βββ test_pairs.txt
βββ test_data.pkl
βββ images
βββ densepose
βββ parsing
code
...
./downloaded_data
: Downloaded original raw images. The images under this folder are no longer needed once the preprocessing is finished.
./wpose
: Preprocessed images and annotations.
3.1.2 Tryon Dataset: WVTON
- Download the annotations wvton.zip and unzip it under the current directory.
- Download the clothing images in JPEG format manually using the urls provided in
./wvton/clothing_urls.txt
. Put these clothing images in a new folder named./cl_downloaded_data
. Do NOT change the file names of the downloaded clothing images. - Download the human images using the links given in
wvton/image_urls.txt
and put them under./downloaded_data
. For your convenience, we provide a download scriptget_wvton_data.py
. Running it will download and preprocess the raw images, but it is solely your responsibility to check the original licenses of the images before using them. Running the code will save the preprocessed images to./wvton/images
and./wvton/clothes
. In our preprocessing pipeline, we crop a rectangle in the image and resize its longer side to 1024 pixels.
After the above steps, your directory structure should look like
cl_downloaded_data
downloaded_data
wvton
βββ README
βββ image_urls.txt
βββ clothing_urls.txt
βββ bbox.txt
βββ clothing_bbox.txt
βββ test_pairs.txt
βββ test_data.pkl
βββ images
βββ clothes
βββ clothes_mask
βββ densepose
βββ parsing
βββ mmpose_clothes
βββ mmpose_human
code
...
./downloaded_data
: Downloaded original raw human images. The images under this folder are no longer needed once the preprocessing is finished.
./cl_downloaded_data
: Downloaded original raw clothing images. The images under the folder are no longer needed once the preprocessing is finished.
./wvton
: Preprocessed images and annotations.
3.1.3 Other Public Datasets
See VITON-HD, DressCode and DeepFashion-Multimodal.
LH-400K
- Download the annotations lh-400k.zip and unzip it under the current directory.
- Download the images using the links given in
wpose/image_urls.txt
and put them under./downloaded_data
. For your convenience, we provide a download scriptget_laion_data.py
. Running it will download and preprocess the raw images, but it is solely your responsibility to check the original licenses of the images before using them. Running the code will save the preprocessed images to./lh-400k/images
. In our preprocessing pipeline, we resize the image's longer side to 512 pixels.
After the above steps, your directory structure should look like
lh-400k
βββ README
βββ image_urls.txt
βββ train_data.pkl
βββ images
βββ densepose
βββ parsing
code
...
At the time of releasing this dataset, 408,520 image urls are still valid.
Please make sure that the required dependencies have been installed and your GPU memory is at least 16G.
- Download the human parser checkpoint and put it under
code/Graphonomy/
. - Download pretrained models and unzip the file under the current directory.
- Run
cd code python demo.py
- Go to
http://0.0.0.0:7860
on your web browser. The webpage should look like below
To edit a human image, follow the steps below:
Upload source image -> Choose an editing task -> Click Confirm Task
-> Upload target image -> Click Submit
It's suggested that your source and target image resolutions are at least 512 because we resize the input image's longer side to 512 pixels and then pad the shorter side to the same length. The generated image of size 512x512 will show on the top right. Intermediate outputs including detected pose and parsing will be displayed on the bottom right.
We provide the model inference script code/infer.py
for you to edit the input images in a folder. To use the script, please follow step 1 and 2 in Section 4: Gradio Demo to prepare the pretrained models.
-
Pose Transfer
Put the file paths to source images in
source_img_paths.txt
and the paths to target images intgt_img_paths.txt
. Then runcd code python infer.py --task reposing --src-img-list source_img_paths.txt --tgt-img-list tgt_img_paths.txt --out-dir ./results
-
Virtual Try-on
Put the file paths to source images in
source_img_paths.txt
and the paths to garment images inclothes_list.txt
. Make sure the try-on garments belong to the same category (i.e., upper clothing, lower clothing or dress). Then runcd code python infer.py --task tryon --src-img-list source_img_paths.txt --tgt-clothes-list clothes_list.txt --tryon-cat [upper/lower/dress] --out-dir ./results
-
Text Edit
Put the file paths to source images in
source_img_paths.txt
and the text prompts inprompt_list.txt
. Make sure the garments you want to edit belong to the same category (i.e., upper clothing, lower clothing or dress). Then runcd code python infer.py --task text_edit --src-img-list source_img_paths.txt --prompt-list prompt_list.txt --edit-cat [upper/lower/dress] --out-dir ./results --cfg_scale 4
To get the best results, you may want to try different values of
--cfg_scale
, which usually ranges from 1 to 10.
It's suggested that your source and target image resolutions are at least 512 because we resize the input image's longer side to 512 pixels and then pad the shorter side to the same length. The generated image is of size 512x512.
Please cite our paper if you use the dataset in your work.
@InProceedings{Li_UniHuman_2024,
author = {Nannan Li, Qing Liu, Krishna Kumar Singh, Yilin Wang, Jianming Zhang, Bryan A. Plummer, Zhe Lin},
title = {UniHuman: A Unified Model For Editing Human Images in the Wild},
booktitle = {CVPR},
year = {2024},
}
We thank Yi Zhou and her collaborators for suggestions about human dense pose related operations.
This project is released under Adobe Research License. The license prohibits commercial use and allows for non-commercial research use. The modeling code is partially built upon ControlNet, which is under Apache-2.0 License.