Skip to content

Commit

Permalink
Readme and experimental results
Browse files Browse the repository at this point in the history
  • Loading branch information
padeler committed Nov 26, 2021
1 parent a314495 commit 0f4d1ed
Show file tree
Hide file tree
Showing 29 changed files with 1,675 additions and 0 deletions.
82 changes: 82 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# POTR: Pose estimation transformer
Vision transformer architectures have been demonstrated to work
for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction.

POTR is a pure transformer architecture (no CNN backbone) for 2D body pose estimation.

You can use the code in this repository to train and evaluate different POTR configurations on the COCO dataset.

## Acknowledgements
The code in this repository is based on the following:

- [End-to-End Object Detection with Transformers (DETR)](https://github.com/facebookresearch/detr)
Xcit
- [Cross-Covariance Image Transformer (XCiT)](https://github.com/facebookresearch/xcit)

- [DeiT: Data-efficient Image Transformers](https://github.com/facebookresearch/deit)

- [Simple Baselines for Human Pose Estimation and Tracking](https://github.com/microsoft/human-pose-estimation.pytorch)

- [Pytoch image models (TIMM)](https://github.com/rwightman/pytorch-image-models)


Thank you!

## Preparing

Create a python venv and install all the dependenciesQ

```bash
python -m venv pyenv
source pyenv/bin/activate
pip install -r requirements.txt
```

## Training

Training POTR with a __deit_small__ encoder, patch size of __16x16__ pixels and input resolution __192x256__:

```bash
python lit_main.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24
```

POTR with Xcit_small_p16 encoder:

```bash
python lit_main.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24 --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth

```

POTR with the ViT as Backbone (VAB) configuration:

```bash
python lit_main.py --vit_as_backbone --vit_arch resnet50 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --position_embedding learned_nocls --num_workers 16 --num_queries 100 --dim_feedforward 1536 --accumulate_grad_batches 1
```

Check the ```lit_main.py``` cli arguments for a complete list.
```bash
python lit_main.py --help
```

## Evaluation

Evaluate a trained model using the ```evaluate.py``` script.

For example to evaluate POTR trained with an xcit_small_12_p8 encoder:

```
python evaluate.py --vit_arch xcit_small_12_p8 --patch_size 8 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --position_embedding enc_xcit --num_workers 16 --num_queries 100 --dim_feedforward 1536 --init_weights paper_experiments/xcit_small12_p8_dino_192_256_paper/checkpoints/checkpoint-epoch\=065-AP\=0.736.ckpt --use_det_bbox
```

Set the argument of --init_weights to your model's checkpoint.


## Experiments








5 changes: 5 additions & 0 deletions experiments/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# POTR Experiments

Configurations (in yaml format) and evaluation results (using evaluate.py)
for the experiments in the paper.

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.632
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.867
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.710
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.619
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.662
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.726
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.923
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.801
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.687
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.783
98 changes: 98 additions & 0 deletions experiments/VAB_deit_small_p16_192x256_paper/hparams.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
accelerator: null
accumulate_grad_batches: 1
activation: gelu
amp_backend: native
amp_level: O2
auto_lr_find: false
auto_scale_batch_size: false
auto_select_gpus: false
aux_loss: true
batch_size: 42
bbox_loss_coef: 5
benchmark: false
check_val_every_n_epoch: 1
checkpoint_callback: true
clip_max_norm: 0.1
coco_path: /home/padeler/work/datasets/coco_2017
dataset_file: coco
debug: false
dec_arch: detr
dec_layers: 6
default_root_dir: null
deterministic: false
dice_loss_coef: 1
dim_feedforward: 1536
distributed_backend: null
dropout: 0.0
enc_layers: 6
eos_coef: 0.1
epochs: 80
eval: false
fast_dev_run: false
flush_logs_every_n_steps: 100
gpus: 1
gradient_clip_algorithm: norm
gradient_clip_val: 0.0
hidden_dim: 384
init_weights: null
input_size: !!python/tuple
- 192
- 256
limit_predict_batches: 1.0
limit_test_batches: 1.0
limit_train_batches: 1.0
limit_val_batches: 1.0
log_every_n_steps: 50
log_gpu_memory: null
logger: true
lr: 0.0001
lr_backbone: 1.0e-05
lr_drop: 50
mask_loss_coef: 1
max_epochs: null
max_steps: null
max_time: null
min_epochs: null
min_steps: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
nheads: 8
num_nodes: 1
num_processes: 1
num_queries: 100
num_sanity_val_steps: 2
num_workers: 16
overfit_batches: 0.0
patch_size: 16
plugins: null
position_embedding: enc_xcit
pre_norm: false
precision: 32
prepare_data_per_node: true
process_position: 0
profiler: null
progress_bar_refresh_rate: null
reload_dataloaders_every_epoch: false
replace_sampler_ddp: true
resume_from_checkpoint: null
scale_factor: 0.3
seed: 42
set_cost_bbox: 5
set_cost_class: 1
stochastic_weight_avg: false
sync_batchnorm: false
terminate_on_nan: false
tpu_cores: null
track_grad_norm: -1
truncated_bptt_steps: null
use_det_bbox: false
val_check_interval: 1.0
vit_arch: deit_deit_small
vit_as_backbone: true
vit_dim: 384
vit_dropout: 0.0
vit_weights: null
weight_decay: 0.0001
weights_save_path: null
weights_summary: top
with_lpi: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.622
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.863
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.697
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.608
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.654
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.716
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.920
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.791
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.674
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.775
98 changes: 98 additions & 0 deletions experiments/deit_small_p16_192x256_distilled_paper/hparams.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
accelerator: null
accumulate_grad_batches: 1
activation: gelu
amp_backend: native
amp_level: O2
auto_lr_find: false
auto_scale_batch_size: false
auto_select_gpus: false
aux_loss: true
batch_size: 42
bbox_loss_coef: 5
benchmark: false
check_val_every_n_epoch: 1
checkpoint_callback: true
clip_max_norm: 0.1
coco_path: /home/padeler/work/datasets/coco_2017
dataset_file: coco
debug: false
dec_arch: detr
dec_layers: 6
default_root_dir: null
deterministic: false
dice_loss_coef: 1
dim_feedforward: 1536
distributed_backend: null
dropout: 0.0
enc_layers: 6
eos_coef: 0.1
epochs: 80
eval: false
fast_dev_run: false
flush_logs_every_n_steps: 100
gpus: 1
gradient_clip_algorithm: norm
gradient_clip_val: 0.0
hidden_dim: 384
init_weights: null
input_size: !!python/tuple
- 192
- 256
limit_predict_batches: 1.0
limit_test_batches: 1.0
limit_train_batches: 1.0
limit_val_batches: 1.0
log_every_n_steps: 50
log_gpu_memory: null
logger: true
lr: 0.0001
lr_backbone: 1.0e-05
lr_drop: 50
mask_loss_coef: 1
max_epochs: null
max_steps: null
max_time: null
min_epochs: null
min_steps: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle
nheads: 8
num_nodes: 1
num_processes: 1
num_queries: 100
num_sanity_val_steps: 2
num_workers: 24
overfit_batches: 0.0
patch_size: 16
plugins: null
position_embedding: enc_xcit
pre_norm: false
precision: 32
prepare_data_per_node: true
process_position: 0
profiler: null
progress_bar_refresh_rate: null
reload_dataloaders_every_epoch: false
replace_sampler_ddp: true
resume_from_checkpoint: null
scale_factor: 0.3
seed: 42
set_cost_bbox: 5
set_cost_class: 1
stochastic_weight_avg: false
sync_batchnorm: false
terminate_on_nan: false
tpu_cores: null
track_grad_norm: -1
truncated_bptt_steps: null
use_det_bbox: false
val_check_interval: 1.0
vit_arch: deit_deit_small
vit_as_backbone: false
vit_dim: 384
vit_dropout: 0.0
vit_weights: null
weight_decay: 0.0001
weights_save_path: null
weights_summary: top
with_lpi: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.608
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.854
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.683
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.595
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.640
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.707
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.915
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.782
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.767
Loading

0 comments on commit 0f4d1ed

Please sign in to comment.