This is the official implementation of the TMM paper Joint-Limb Compound Triangulation With Co-Fixing for Stereoscopic Human Pose Estimation.
Note: This repository is extended with code for MHAD and joint training.
experiments
: configuration files. The files are all in yaml format.lib
: main code.dataset
: the dataloaders.models
: network model files.utils
: tools, functions, data format, etc.
backbone_pretrain.py
: file to pretrain 2D backbone before E2E training.config.py
: configuration processors and the default config.main.py
: file to do E2E training.
For convenience, we refer to the root directory of this repo as ${ROOT}
.
First install the latest torch that fits your cuda version, then install the listed requirements. Note that this repository is tested under torch version 1.13.0
and cuda version 11.7
.
cd ${ROOT}
pip install -r requirement.txt
Human3.6M
- Follow this guide to prepare image data and labels. Refer to the fetched data directory as
${H36M_ROOT}
, then the directory should look like this:${H36M_ROOT} |-- processed | |-- S1/ | ... |-- extra | |- human36m-multiview-labels-GTbboxes.npy | ... ...
- Generate monocular labels at
${H36M_ROOT}/extra/human36m-monocular-labels-GTbboxes.npy
python lib/dataset/convert-multiview-to-monocular.py ${H36M_ROOT}/extra
Total Capture
- Use Total Capture Toolbox to prepare data. Suppose the processed data root directory is
${TC_ROOT}
(usuallyTotalCapture-Toolbox/data
). It should look like this:
${TC_ROOT}
|-- annot
| |-- totalcapture_train.pkl
| `-- totalcapture_validation.pkl
`-- images
Here we provide the weights for ResNet152 which we used for our model:
- Pretrained on ImageNet: link
- Pretrained on COCO and finetuned on Human3.6M and MPII (From Learnable Triangulation): link
Create a folder named pretrained
under ${ROOT}
and place the weights in it. If you want the backbone pretraining step to work out-of-the-box, the folder should look like this:
pretrained
|-- from_lt/pose_resnet_4.5_pixels_human36m.pth
`-- pytorch/imagenet/resnet152-b121ed2d.pth
We also provide the 4-view weights for Huamn3.6M and Total Capture, which reproduces the results in the paper:
Place the weights at a certain directory so the path could be referred to as ${weight_path}
. We will be using this path in the Testing stage.
We train the model in a two-step manner: first train the 2D backbone which outputs the joint confidence heatmap and the LOF. Then we train the model end-to-end for better accuracy.
To do pretraining, just run:
python backbone_pretrain.py --cfg experiments/ResNet${n_layers}/${dataset}-${resolution}-backbone.yaml --runMode train
Then, to train the model end-to-end:
python main.py --cfg experiments/ResNet${n_layers}/${dataset}-${resolution}.yaml --runMode train
python main.py\
--cfg experiments/ResNet${n_layers}/${dataset}-${resolution}.yaml\
--runMode test\
-w ${weight_path}
For this repo, we provide n_layers=152
, dataset=human3.6m | totalcapture
and resolution=384x384 | 320x320
as an example.
Note: If you wish to train or test using multiple GPUS, please specify the GPU ids in the config file. By default, the script only uses GPU 0 for training / testing.
If you use our code, please cite us with:
@article{zhuo2024compound,
author={Chen, Zhuo and Wan, Xiaoyue and Bao, Yiming and Zhao, Xu},
journal={IEEE Transactions on Multimedia},
title={Joint-Limb Compound Triangulation With Co-Fixing for Stereoscopic Human Pose Estimation},
year={2024},
pages={1-11},
doi={10.1109/TMM.2024.3410514}}