Readme and experimental results

padeler · Nov 26, 2021 · 0f4d1ed · 0f4d1ed
1 parent a314495
commit 0f4d1ed
Show file tree

Hide file tree

Showing 29 changed files with 1,675 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,82 @@
+# POTR: Pose estimation transformer
+Vision transformer architectures have been demonstrated to work 
+for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction.
+
+POTR is a pure transformer architecture (no CNN backbone) for 2D body pose estimation.
+
+You can use the code in this repository to train and evaluate different POTR configurations on the COCO dataset.
+
+## Acknowledgements
+The code in this repository is based on the following:
+
+- [End-to-End Object Detection with Transformers (DETR)](https://github.com/facebookresearch/detr)
+Xcit
+- [Cross-Covariance Image Transformer (XCiT)](https://github.com/facebookresearch/xcit)
+
+- [DeiT: Data-efficient Image Transformers](https://github.com/facebookresearch/deit)
+
+- [Simple Baselines for Human Pose Estimation and Tracking](https://github.com/microsoft/human-pose-estimation.pytorch)
+
+- [Pytoch image models (TIMM)](https://github.com/rwightman/pytorch-image-models)
+
+
+Thank you!
+
+## Preparing
+
+Create a python venv and install all the dependenciesQ
+
+```bash
+python -m venv pyenv
+source pyenv/bin/activate
+pip install -r requirements.txt
+```
+
+## Training 
+
+Training POTR with a __deit_small__ encoder, patch size of __16x16__ pixels and input resolution __192x256__:
+
+```bash
+python lit_main.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24
+```
+
+POTR with Xcit_small_p16 encoder:
+
+```bash
+ python lit_main.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24   --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth
+
+```
+
+POTR with the ViT as Backbone (VAB) configuration:
+
+```bash
+ python lit_main.py --vit_as_backbone --vit_arch resnet50 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --position_embedding learned_nocls --num_workers 16 --num_queries 100 --dim_feedforward 1536 --accumulate_grad_batches 1
+```
+
+Check the ```lit_main.py``` cli arguments for a complete list.
+```bash
+python lit_main.py --help
+```
+
+## Evaluation
+
+Evaluate a trained model using the ```evaluate.py``` script.
+
+For example to evaluate POTR trained with an xcit_small_12_p8 encoder:
+
+```
+python evaluate.py --vit_arch xcit_small_12_p8 --patch_size 8 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384  --position_embedding enc_xcit --num_workers 16 --num_queries 100 --dim_feedforward 1536 --init_weights paper_experiments/xcit_small12_p8_dino_192_256_paper/checkpoints/checkpoint-epoch\=065-AP\=0.736.ckpt --use_det_bbox
+```
+
+Set the argument of --init_weights to your model's checkpoint.
+
+
+## Experiments
+
+
+
+
+
+
+
+
diff --git a/experiments/README.md b/experiments/README.md
@@ -0,0 +1,5 @@
+# POTR Experiments
+
+Configurations (in yaml format) and evaluation results (using evaluate.py) 
+for the experiments in the paper.
+
diff --git a/experiments/VAB_deit_small_p16_192x256_paper/checkpoints/eval_results.txt b/experiments/VAB_deit_small_p16_192x256_paper/checkpoints/eval_results.txt
@@ -0,0 +1,10 @@
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.632
+ Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.867
+ Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.710
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.619
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.662
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.726
+ Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.923
+ Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.801
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.687
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.783
diff --git a/experiments/VAB_deit_small_p16_192x256_paper/hparams.yaml b/experiments/VAB_deit_small_p16_192x256_paper/hparams.yaml
@@ -0,0 +1,98 @@
+accelerator: null
+accumulate_grad_batches: 1
+activation: gelu
+amp_backend: native
+amp_level: O2
+auto_lr_find: false
+auto_scale_batch_size: false
+auto_select_gpus: false
+aux_loss: true
+batch_size: 42
+bbox_loss_coef: 5
+benchmark: false
+check_val_every_n_epoch: 1
+checkpoint_callback: true
+clip_max_norm: 0.1
+coco_path: /home/padeler/work/datasets/coco_2017
+dataset_file: coco
+debug: false
+dec_arch: detr
+dec_layers: 6
+default_root_dir: null
+deterministic: false
+dice_loss_coef: 1
+dim_feedforward: 1536
+distributed_backend: null
+dropout: 0.0
+enc_layers: 6
+eos_coef: 0.1
+epochs: 80
+eval: false
+fast_dev_run: false
+flush_logs_every_n_steps: 100
+gpus: 1
+gradient_clip_algorithm: norm
+gradient_clip_val: 0.0
+hidden_dim: 384
+init_weights: null
+input_size: !!python/tuple
+- 192
+- 256
+limit_predict_batches: 1.0
+limit_test_batches: 1.0
+limit_train_batches: 1.0
+limit_val_batches: 1.0
+log_every_n_steps: 50
+log_gpu_memory: null
+logger: true
+lr: 0.0001
+lr_backbone: 1.0e-05
+lr_drop: 50
+mask_loss_coef: 1
+max_epochs: null
+max_steps: null
+max_time: null
+min_epochs: null
+min_steps: null
+move_metrics_to_cpu: false
+multiple_trainloader_mode: max_size_cycle
+nheads: 8
+num_nodes: 1
+num_processes: 1
+num_queries: 100
+num_sanity_val_steps: 2
+num_workers: 16
+overfit_batches: 0.0
+patch_size: 16
+plugins: null
+position_embedding: enc_xcit
+pre_norm: false
+precision: 32
+prepare_data_per_node: true
+process_position: 0
+profiler: null
+progress_bar_refresh_rate: null
+reload_dataloaders_every_epoch: false
+replace_sampler_ddp: true
+resume_from_checkpoint: null
+scale_factor: 0.3
+seed: 42
+set_cost_bbox: 5
+set_cost_class: 1
+stochastic_weight_avg: false
+sync_batchnorm: false
+terminate_on_nan: false
+tpu_cores: null
+track_grad_norm: -1
+truncated_bptt_steps: null
+use_det_bbox: false
+val_check_interval: 1.0
+vit_arch: deit_deit_small
+vit_as_backbone: true
+vit_dim: 384
+vit_dropout: 0.0
+vit_weights: null
+weight_decay: 0.0001
+weights_save_path: null
+weights_summary: top
+with_lpi: false
diff --git a/experiments/deit_small_p16_192x256_distilled_paper/checkpoints/eval_results.txt b/experiments/deit_small_p16_192x256_distilled_paper/checkpoints/eval_results.txt
@@ -0,0 +1,10 @@
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.622
+ Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.863
+ Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.697
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.608
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.654
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.716
+ Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.920
+ Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.791
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.674
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.775
diff --git a/experiments/deit_small_p16_192x256_distilled_paper/hparams.yaml b/experiments/deit_small_p16_192x256_distilled_paper/hparams.yaml
@@ -0,0 +1,98 @@
+accelerator: null
+accumulate_grad_batches: 1
+activation: gelu
+amp_backend: native
+amp_level: O2
+auto_lr_find: false
+auto_scale_batch_size: false
+auto_select_gpus: false
+aux_loss: true
+batch_size: 42
+bbox_loss_coef: 5
+benchmark: false
+check_val_every_n_epoch: 1
+checkpoint_callback: true
+clip_max_norm: 0.1
+coco_path: /home/padeler/work/datasets/coco_2017
+dataset_file: coco
+debug: false
+dec_arch: detr
+dec_layers: 6
+default_root_dir: null
+deterministic: false
+dice_loss_coef: 1
+dim_feedforward: 1536
+distributed_backend: null
+dropout: 0.0
+enc_layers: 6
+eos_coef: 0.1
+epochs: 80
+eval: false
+fast_dev_run: false
+flush_logs_every_n_steps: 100
+gpus: 1
+gradient_clip_algorithm: norm
+gradient_clip_val: 0.0
+hidden_dim: 384
+init_weights: null
+input_size: !!python/tuple
+- 192
+- 256
+limit_predict_batches: 1.0
+limit_test_batches: 1.0
+limit_train_batches: 1.0
+limit_val_batches: 1.0
+log_every_n_steps: 50
+log_gpu_memory: null
+logger: true
+lr: 0.0001
+lr_backbone: 1.0e-05
+lr_drop: 50
+mask_loss_coef: 1
+max_epochs: null
+max_steps: null
+max_time: null
+min_epochs: null
+min_steps: null
+move_metrics_to_cpu: false
+multiple_trainloader_mode: max_size_cycle
+nheads: 8
+num_nodes: 1
+num_processes: 1
+num_queries: 100
+num_sanity_val_steps: 2
+num_workers: 24
+overfit_batches: 0.0
+patch_size: 16
+plugins: null
+position_embedding: enc_xcit
+pre_norm: false
+precision: 32
+prepare_data_per_node: true
+process_position: 0
+profiler: null
+progress_bar_refresh_rate: null
+reload_dataloaders_every_epoch: false
+replace_sampler_ddp: true
+resume_from_checkpoint: null
+scale_factor: 0.3
+seed: 42
+set_cost_bbox: 5
+set_cost_class: 1
+stochastic_weight_avg: false
+sync_batchnorm: false
+terminate_on_nan: false
+tpu_cores: null
+track_grad_norm: -1
+truncated_bptt_steps: null
+use_det_bbox: false
+val_check_interval: 1.0
+vit_arch: deit_deit_small
+vit_as_backbone: false
+vit_dim: 384
+vit_dropout: 0.0
+vit_weights: null
+weight_decay: 0.0001
+weights_save_path: null
+weights_summary: top
+with_lpi: false
diff --git a/experiments/deit_small_p16_192x256_paper/checkpoints/eval_results.txt b/experiments/deit_small_p16_192x256_paper/checkpoints/eval_results.txt
@@ -0,0 +1,10 @@
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.608
+ Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.854
+ Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.683
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.595
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.640
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.707
+ Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.915
+ Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.782
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.664
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.767