We here provide a simple codebase to fine-tune our Depth Anything V2 pre-trained encoder for metric depth estimation. Built on our powerful encoder, we use a simple DPT head to regress the depth. We fine-tune our pre-trained encoder on synthetic Hypersim / Virtual KITTI datasets for indoor / outdoor metric depth estimation, respectively.
Please first download our pre-trained metric depth models and put them under the checkpoints
directory:
# indoor scenes
python run.py \
--encoder vitl --load-from checkpoints/depth_anything_v2_metric_hypersim_vitl.pth \
--max-depth 20 --img-path <path> --outdir <outdir> [--input-size <size>] [--save-numpy]
# outdoor scenes
python run.py \
--encoder vitl --load-from checkpoints/depth_anything_v2_metric_vkitti_vitl.pth \
--max-depth 80 --img-path <path> --outdir <outdir> [--input-size <size>] [--save-numpy]
You can also project 2D images to point clouds:
python depth_to_pointcloud.py \
--encoder vitl --load-from checkpoints/depth_anything_v2_metric_hypersim_vitl.pth \
--max-depth 20 --img-path <path> --outdir <outdir>
Please first prepare the Hypersim and Virtual KITTI 2 datasets. Then:
bash dist_train.sh
If you find this project useful, please consider citing:
@article{depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv:2406.09414},
year={2024}
}