This is a re-implementation of 《CoordiNet: uncertainty-aware pose regressor for reliable vehicle localization》 in pytorch.
More visual localization methods: awesome-visual-localization.
~~hint: the Evaluation Performance is a little bit worse than the original paper. ~~
- python 3.8.10 single gpu like RTX3080 is enough
6 scenes: KingsCollege, OldHospital, ShopFacade, StMarysChurch, Street, GreatCourt. link: Cambridge Landmarks
7 scenes: Fire, Heads, Office, Pumpkin, Redkitchen, Stairs, Storage. link: 7 scenes
- torch dataset class: splitting out the train and test images set paths is enough, along with the 3×4 poses.
- addition: referring to 《LENS: Localization enhanced by NeRF synthesis》, I use
nerf-w
model to synthesize novel views for enhanced training that
synthesis_split.txt
, the nerf-w project is here
example:
python train_coordinet.py \
--root_dir ./runs/coordinet --exp_name exp \
--batch_size 10 --epochs 250 --lr 0.0001 \
--save_freq 100 --log_freq 1 \
--last_epoch 0 --ckpt_path \
--data_root_dir $DATA_ROOT_DIR --scene &SCENE_NAME \
--reshape_size 320 --crop_size 300 \
--fixed_weight False --learn_beta True --loss_type homosc
learn_beta
: coefficient of the geometric loss for the 4 losses:(Tx,Ty,Tz,R), beta means log(σ²), whether to learn the beta parameter in the loss function, refer to 《Geometric Loss Functions for Camera Pose Regression with Deep Learning》loss_type
- 'homosc': homoscedastic loss, enable
learn_beta
to learn the beta parameter or use a simple accumulated loss of t, R. - 'heterosc': heteroscedastic loss, uncertainty loss proposed in the paper, you should also set
var_min
of the 4 losses in order to avoid inf value.
- 'homosc': homoscedastic loss, enable
fixed_weight
: fix the weight of the efficientnet backbone or not.- to resume: set
last_epoch
>0 andckpt_path
to resume training. - default I use efficientnet-b3 as the backbone and use 'homosc' loss.
some results comparison with the original paper:
comment:
- I didn't pay a lot energey to carefully finetune the parameters。 And, these end to end deep learning methods using models like cnns, transformers, etc. are not as good as the traditional ways like hscnet, pixloc, etc. in terms of accuracy, stability, and robustness. But, they are more efficient and easier to use.
- Essentially, coordinet and similar end-to-end deep learning methods rely on the training set to establish an interpolation function for pose estimation. Therefore, it greatly depends on the generalization and distribution of the train set.
- This project doesn't implement the latter EKF part, just finish pose regressor network.