-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
29 changed files
with
1,675 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# POTR: Pose estimation transformer | ||
Vision transformer architectures have been demonstrated to work | ||
for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction. | ||
|
||
POTR is a pure transformer architecture (no CNN backbone) for 2D body pose estimation. | ||
|
||
You can use the code in this repository to train and evaluate different POTR configurations on the COCO dataset. | ||
|
||
## Acknowledgements | ||
The code in this repository is based on the following: | ||
|
||
- [End-to-End Object Detection with Transformers (DETR)](https://github.com/facebookresearch/detr) | ||
Xcit | ||
- [Cross-Covariance Image Transformer (XCiT)](https://github.com/facebookresearch/xcit) | ||
|
||
- [DeiT: Data-efficient Image Transformers](https://github.com/facebookresearch/deit) | ||
|
||
- [Simple Baselines for Human Pose Estimation and Tracking](https://github.com/microsoft/human-pose-estimation.pytorch) | ||
|
||
- [Pytoch image models (TIMM)](https://github.com/rwightman/pytorch-image-models) | ||
|
||
|
||
Thank you! | ||
|
||
## Preparing | ||
|
||
Create a python venv and install all the dependenciesQ | ||
|
||
```bash | ||
python -m venv pyenv | ||
source pyenv/bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Training | ||
|
||
Training POTR with a __deit_small__ encoder, patch size of __16x16__ pixels and input resolution __192x256__: | ||
|
||
```bash | ||
python lit_main.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24 | ||
``` | ||
|
||
POTR with Xcit_small_p16 encoder: | ||
|
||
```bash | ||
python lit_main.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24 --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth | ||
|
||
``` | ||
|
||
POTR with the ViT as Backbone (VAB) configuration: | ||
|
||
```bash | ||
python lit_main.py --vit_as_backbone --vit_arch resnet50 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --position_embedding learned_nocls --num_workers 16 --num_queries 100 --dim_feedforward 1536 --accumulate_grad_batches 1 | ||
``` | ||
|
||
Check the ```lit_main.py``` cli arguments for a complete list. | ||
```bash | ||
python lit_main.py --help | ||
``` | ||
|
||
## Evaluation | ||
|
||
Evaluate a trained model using the ```evaluate.py``` script. | ||
|
||
For example to evaluate POTR trained with an xcit_small_12_p8 encoder: | ||
|
||
``` | ||
python evaluate.py --vit_arch xcit_small_12_p8 --patch_size 8 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --position_embedding enc_xcit --num_workers 16 --num_queries 100 --dim_feedforward 1536 --init_weights paper_experiments/xcit_small12_p8_dino_192_256_paper/checkpoints/checkpoint-epoch\=065-AP\=0.736.ckpt --use_det_bbox | ||
``` | ||
|
||
Set the argument of --init_weights to your model's checkpoint. | ||
|
||
|
||
## Experiments | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# POTR Experiments | ||
|
||
Configurations (in yaml format) and evaluation results (using evaluate.py) | ||
for the experiments in the paper. | ||
|
10 changes: 10 additions & 0 deletions
10
experiments/VAB_deit_small_p16_192x256_paper/checkpoints/eval_results.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.632 | ||
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.867 | ||
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.710 | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.619 | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.662 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.726 | ||
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.923 | ||
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.801 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.687 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.783 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
accelerator: null | ||
accumulate_grad_batches: 1 | ||
activation: gelu | ||
amp_backend: native | ||
amp_level: O2 | ||
auto_lr_find: false | ||
auto_scale_batch_size: false | ||
auto_select_gpus: false | ||
aux_loss: true | ||
batch_size: 42 | ||
bbox_loss_coef: 5 | ||
benchmark: false | ||
check_val_every_n_epoch: 1 | ||
checkpoint_callback: true | ||
clip_max_norm: 0.1 | ||
coco_path: /home/padeler/work/datasets/coco_2017 | ||
dataset_file: coco | ||
debug: false | ||
dec_arch: detr | ||
dec_layers: 6 | ||
default_root_dir: null | ||
deterministic: false | ||
dice_loss_coef: 1 | ||
dim_feedforward: 1536 | ||
distributed_backend: null | ||
dropout: 0.0 | ||
enc_layers: 6 | ||
eos_coef: 0.1 | ||
epochs: 80 | ||
eval: false | ||
fast_dev_run: false | ||
flush_logs_every_n_steps: 100 | ||
gpus: 1 | ||
gradient_clip_algorithm: norm | ||
gradient_clip_val: 0.0 | ||
hidden_dim: 384 | ||
init_weights: null | ||
input_size: !!python/tuple | ||
- 192 | ||
- 256 | ||
limit_predict_batches: 1.0 | ||
limit_test_batches: 1.0 | ||
limit_train_batches: 1.0 | ||
limit_val_batches: 1.0 | ||
log_every_n_steps: 50 | ||
log_gpu_memory: null | ||
logger: true | ||
lr: 0.0001 | ||
lr_backbone: 1.0e-05 | ||
lr_drop: 50 | ||
mask_loss_coef: 1 | ||
max_epochs: null | ||
max_steps: null | ||
max_time: null | ||
min_epochs: null | ||
min_steps: null | ||
move_metrics_to_cpu: false | ||
multiple_trainloader_mode: max_size_cycle | ||
nheads: 8 | ||
num_nodes: 1 | ||
num_processes: 1 | ||
num_queries: 100 | ||
num_sanity_val_steps: 2 | ||
num_workers: 16 | ||
overfit_batches: 0.0 | ||
patch_size: 16 | ||
plugins: null | ||
position_embedding: enc_xcit | ||
pre_norm: false | ||
precision: 32 | ||
prepare_data_per_node: true | ||
process_position: 0 | ||
profiler: null | ||
progress_bar_refresh_rate: null | ||
reload_dataloaders_every_epoch: false | ||
replace_sampler_ddp: true | ||
resume_from_checkpoint: null | ||
scale_factor: 0.3 | ||
seed: 42 | ||
set_cost_bbox: 5 | ||
set_cost_class: 1 | ||
stochastic_weight_avg: false | ||
sync_batchnorm: false | ||
terminate_on_nan: false | ||
tpu_cores: null | ||
track_grad_norm: -1 | ||
truncated_bptt_steps: null | ||
use_det_bbox: false | ||
val_check_interval: 1.0 | ||
vit_arch: deit_deit_small | ||
vit_as_backbone: true | ||
vit_dim: 384 | ||
vit_dropout: 0.0 | ||
vit_weights: null | ||
weight_decay: 0.0001 | ||
weights_save_path: null | ||
weights_summary: top | ||
with_lpi: false |
10 changes: 10 additions & 0 deletions
10
experiments/deit_small_p16_192x256_distilled_paper/checkpoints/eval_results.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.622 | ||
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.863 | ||
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.697 | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.608 | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.654 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.716 | ||
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.920 | ||
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.791 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.674 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.775 |
98 changes: 98 additions & 0 deletions
98
experiments/deit_small_p16_192x256_distilled_paper/hparams.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
accelerator: null | ||
accumulate_grad_batches: 1 | ||
activation: gelu | ||
amp_backend: native | ||
amp_level: O2 | ||
auto_lr_find: false | ||
auto_scale_batch_size: false | ||
auto_select_gpus: false | ||
aux_loss: true | ||
batch_size: 42 | ||
bbox_loss_coef: 5 | ||
benchmark: false | ||
check_val_every_n_epoch: 1 | ||
checkpoint_callback: true | ||
clip_max_norm: 0.1 | ||
coco_path: /home/padeler/work/datasets/coco_2017 | ||
dataset_file: coco | ||
debug: false | ||
dec_arch: detr | ||
dec_layers: 6 | ||
default_root_dir: null | ||
deterministic: false | ||
dice_loss_coef: 1 | ||
dim_feedforward: 1536 | ||
distributed_backend: null | ||
dropout: 0.0 | ||
enc_layers: 6 | ||
eos_coef: 0.1 | ||
epochs: 80 | ||
eval: false | ||
fast_dev_run: false | ||
flush_logs_every_n_steps: 100 | ||
gpus: 1 | ||
gradient_clip_algorithm: norm | ||
gradient_clip_val: 0.0 | ||
hidden_dim: 384 | ||
init_weights: null | ||
input_size: !!python/tuple | ||
- 192 | ||
- 256 | ||
limit_predict_batches: 1.0 | ||
limit_test_batches: 1.0 | ||
limit_train_batches: 1.0 | ||
limit_val_batches: 1.0 | ||
log_every_n_steps: 50 | ||
log_gpu_memory: null | ||
logger: true | ||
lr: 0.0001 | ||
lr_backbone: 1.0e-05 | ||
lr_drop: 50 | ||
mask_loss_coef: 1 | ||
max_epochs: null | ||
max_steps: null | ||
max_time: null | ||
min_epochs: null | ||
min_steps: null | ||
move_metrics_to_cpu: false | ||
multiple_trainloader_mode: max_size_cycle | ||
nheads: 8 | ||
num_nodes: 1 | ||
num_processes: 1 | ||
num_queries: 100 | ||
num_sanity_val_steps: 2 | ||
num_workers: 24 | ||
overfit_batches: 0.0 | ||
patch_size: 16 | ||
plugins: null | ||
position_embedding: enc_xcit | ||
pre_norm: false | ||
precision: 32 | ||
prepare_data_per_node: true | ||
process_position: 0 | ||
profiler: null | ||
progress_bar_refresh_rate: null | ||
reload_dataloaders_every_epoch: false | ||
replace_sampler_ddp: true | ||
resume_from_checkpoint: null | ||
scale_factor: 0.3 | ||
seed: 42 | ||
set_cost_bbox: 5 | ||
set_cost_class: 1 | ||
stochastic_weight_avg: false | ||
sync_batchnorm: false | ||
terminate_on_nan: false | ||
tpu_cores: null | ||
track_grad_norm: -1 | ||
truncated_bptt_steps: null | ||
use_det_bbox: false | ||
val_check_interval: 1.0 | ||
vit_arch: deit_deit_small | ||
vit_as_backbone: false | ||
vit_dim: 384 | ||
vit_dropout: 0.0 | ||
vit_weights: null | ||
weight_decay: 0.0001 | ||
weights_save_path: null | ||
weights_summary: top | ||
with_lpi: false |
10 changes: 10 additions & 0 deletions
10
experiments/deit_small_p16_192x256_paper/checkpoints/eval_results.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.608 | ||
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.854 | ||
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.683 | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.595 | ||
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.640 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.707 | ||
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.915 | ||
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.782 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664 | ||
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.767 |
Oops, something went wrong.