Skip to content

Latest commit

 

History

History
 
 

pose_estimation

Applying MogaNet to Pose Estimation

This repo is a PyTorch implementation of applying MogaNet to 2D human pose estimation on COCO. The code is based on MMPose. For more details, see Efficient Multi-order Gated Aggregation Network (arXiv 2022).

Note

Please note that we simply follow the hyper-parameters of PVT and Swin which may not be the optimal ones for MogaNet. Feel free to tune the hyper-parameters to get better performance.

Environement Setup

Install MMPose from souce code, or follow the following steps. This experiment uses MMPose>=0.29.0, and we reproduced the results with MMPose v0.29.0 and Pytorch==1.10.

pip install openmim
mim install mmcv-full
pip install mmpose

Note: Since we write MogaNet backbone code of detection, segmentation, and pose estimation in the same file, it also works for MMDetection and MMSegmentation through @BACKBONES.register_module(). Please continue to install MMDetection or MMSegmentation for further usage.

Data preparation

Download COCO2017 and prepare COCO experiments according to the guidelines in MMPose.

(back to top)

Results and models on COCO

Notes: All the models use ImageNet-1K pre-trained backbones and can also be downloaded by Baidu Cloud (z8mf) at MogaNet/COCO_Pose. The params (M) and FLOPs (G) are measured by get_flops with 256 $\times$ 192 or 384 $\times$ 288 resolutions.

python get_flops.py /path/to/config --shape 256 192

MogaNet + Top-Down

We provide results of MogaNet and popular architectures (Swin, ConvNeXt, and Uniformer) in comparison.

Backbone Input Size Params FLOPs AP AP50 AP75 AR ARM ARL Config Download
MogaNet-XT 256x192 5.6M 1.8G 72.1 89.7 80.1 77.7 73.6 83.6 config log | model
MogaNet-XT 384x288 5.6M 4.2G 74.7 90.1 81.3 79.9 75.9 85.9 config log | model
MogaNet-T 256x192 8.1M 2.2G 73.2 90.1 81.0 78.8 74.9 84.4 config log | model
MogaNet-T 384x288 8.1M 4.9G 75.7 90.6 82.6 80.9 76.8 86.7 config log | model
MogaNet-S 256x192 29.0M 6.0G 74.9 90.7 82.8 80.1 75.7 86.3 config log | model
MogaNet-S 384x288 29.0M 13.5G 76.4 91.0 83.3 81.4 77.1 87.7 config log | model
MogaNet-B 256x192 47.4M 10.9G 75.3 90.9 83.3 80.7 76.4 87.1 config log | model
MogaNet-B 384x288 47.4M 24.4G 77.3 91.4 84.0 82.2 77.9 88.5 config log | model

MetaFormers + Top-Down

Backbone Input Size Params FLOPs AP AP50 AP75 AR ARM ARL Config Download
Swin-T 256x192 32.8M 6.1G 72.4 90.1 80.6 78.2 74.0 84.3 config model | log
Swin-B 256x192 93.0M 18.6G 73.7 90.4 82.0 79.8 74.9 85.7 config model | log
Swin-B 384x288 93.0M 40.1G 75.9 91.0 83.2 78.8 76.5 87.5 config model | log
Swin-L 256x192 203.4M 40.3G 74.3 90.6 82.1 79.8 75.5 86.2 config model | log
Swin-L 384x288 203.4M 86.9G 76.3 91.2 83.0 81.4 77.0 87.9 config model | log
ConvNeXt-T 256x192 33.0M 5.5G 73.2 90.0 80.9 78.8 74.5 85.1 config log | model
ConvNeXt-T 384x288 33.0M 12.5G 75.3 90.4 82.1 80.5 76.1 86.8 config log | model
ConvNeXt-S 256x192 54.7M 9.7G 73.7 90.3 81.9 79.3 75.0 85.5 config log | model
ConvNeXt-S 384x288 54.7M 21.8G 75.8 90.7 83.1 81.0 76.8 87.1 config log | model
UniFormer-S 256x192 25.2M 4.7G 74.0 90.3 82.2 79.5 66.8 76.7 config log | model
UniFormer-S 384x288 25.2M 11.1G 75.9 90.6 83.4 81.4 68.6 79.0 config log | model
UniFormer-B 256x192 53.5M 9.2G 75.0 90.6 83.0 80.4 67.8 77.7 config log | model
UniFormer-B 384x288 53.5M 14.8G 76.7 90.8 84.0 81.4 69.3 79.7 config log | model

Training

We train the model on a single node with 8 GPUs by default (a batch size of 32 $\times$ 8 for Top-Down). Start training with the config as:

PORT=29001 bash dist_train.sh /path/to/config 8

Evaluation

To evaluate the trained model on a single node with 8 GPUs, run:

bash dist_test.sh /path/to/config /path/to/checkpoint 8 --out results.pkl --eval mAP

Citation

If you find this repository helpful, please consider citing:

@article{Li2022MogaNet,
  title={Efficient Multi-order Gated Aggregation Network},
  author={Siyuan Li and Zedong Wang and Zicheng Liu and Cheng Tan and Haitao Lin and Di Wu and Zhiyuan Chen and Jiangbin Zheng and Stan Z. Li},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.03295}
}

Acknowledgment

Our segmentation implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

(back to top)