Skip to content

4DS (Scaling 4D Representations) BetterDepth ChronoDepth Depth Any Video Depth Anything Depth Pro DepthCrafter DINOv2 FutureDepth GenPercept GeoWizard LeReS LightedDepth Marigold Metric3D MiDaS MoGe MonST3R NeWCRFs NVDS NVDS+ PatchFusion StereoCrafter UniDepth ZoeDepth | Align3R Buffer Anytime FiffDepth ImmersePro MegaSaM RollingDepth SpatialMe

Notifications You must be signed in to change notification settings

AIVFI/Monocular-Depth-Estimation-Rankings-and-2D-to-3D-Video-Conversion-Rankings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 

Repository files navigation

Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

Researchers! On 19 December 2024, a preprint paper was published that focuses on "evaluating self-supervised learning on non-semantic vision tasks that are more spatial (3D) and temporal (+1D = 4D), such as camera pose estimation, point and object tracking, and depth estimation." The 4DS-j model presented there achieves significantly better monocular depth estimation results than DINOv2 ViT-g, making it a better backbone than DINOv2 for specialised video depth estimation models that can be the basis for better 2D to 3D video conversion, too! Please try to implement the 4DS-j backbone instead of DINOv2 ViT-g for your future breakthrough video depth estimation models! Below is a special ranking showing the capabilities of 4DS-j:

ScanNet: AbsRel (TOP2 best backbone for monocular depth estimation )

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
(frozen
backbone)
arXiv
4DS
  AbsRel ↓  
(short
finetuning)
arXiv
4DS
  AbsRel ↓  
(medium
finetuning)
arXiv
4DS
  AbsRel ↓  
(long
finetuning)
arXiv
4DS
1 4DS-j
arXiv
0.85 0.63 0.59 0.57
2 DinoV2-g
TMLR GitHub Stars
0.92 0.76 0.69 0.66

Due to the recent number of new models that I am unable to add to the rankings immediately, I have decided to add a waiting list of new models:

Method Paper     Venue     Official
  repository  
Align3R Align3R: Aligned Monocular Depth Estimation for Dynamic Videos arXiv GitHub Stars
Buffer Anytime Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors arXiv -
FiffDepth FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation arXiv -
ImmersePro ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning arXiv GitHub Stars
MegaSaM MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos arXiv GitHub Stars
RollingDepth Video Depth without Video Models arXiv GitHub Stars
SpatialMe SpatialMe: Stereo Video Conversion Using Depth-Warping and Blend-Inpainting arXiv -

List of Rankings

2D to 3D Video Conversion Rankings

  1. Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

Monocular Depth Estimation Rankings

I. Rankings based on temporal consistency metrics

  1. ScanNet++ (98 video clips with 32 frames each): TAE
  2. NYU-Depth V2: OPW<=0.37

II. Rankings based on 3D metrics

  1. Direct comparison of 9 metric depth models (each with each) on 5 datasets: F-score

III. Rankings based on 2D metrics

  1. Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.078
  2. NYU-Depth V2: AbsRel<=0.045 (relative depth)
  3. NYU-Depth V2: AbsRel<=0.051 (metric depth)

IV. Old layout - currently no longer up to date

  1. NYU-Depth V2 (640×480): AbsRel<=0.058 (old layout - currently no longer up to date)
  2. DA-2K (mostly 1500×2000): Acc (%)>=86 (old layout - currently no longer up to date)
  3. UnrealStereo4K (3840×2160): AbsRel<=0.04 (old layout - currently no longer up to date)
  4. Middlebury2021 (1920×1080): SqRel<=0.5 (old layout - currently no longer up to date)

Appendices


Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

📝 Note: There are no quantitative comparison results of StereoCrafter yet, so this ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video clip: 22_dogskateboarder and one output frame (right view) is compared with one input frame (left view) from the video clip: scooter-black

RK Model
Links:
         Venue   Repository    
Rank ↓
(human perceptual
judgment)
1 StereoCrafter
arXiv GitHub Stars
1
2-3 Immersity AI 2-3
2-3 Owl3D 2-3
4 Deep3D
ECCV GitHub Stars
4

Back to Top Back to the List of Rankings

ScanNet++ (98 video clips with 32 frames each): TAE

RK Model
Links:
         Venue   Repository    
  TAE ↓  
{Input fr.}
arXiv
DAV
1 Depth Any Video
arXiv GitHub Stars
2.1 {MF}
2 DepthCrafter
arXiv GitHub Stars
2.2 {MF}
3 ChronoDepth
arXiv GitHub Stars
2.3 {MF}
4 NVDS
ICCV GitHub Stars
3.7 {4}

Back to Top Back to the List of Rankings

NYU-Depth V2: OPW<=0.37

RK Model
Links:
         Venue   Repository    
  OPW ↓  
{Input fr.}
ECCV
FD
   OPW ↓   
{Input fr.}
TPAMI
NVDS+
  OPW ↓  
{Input fr.}
ICCV
NVDS
1 FutureDepth
ECCV
0.303 {4} - -
2 NVDS+
TPAMI GitHub Stars
- 0.339 {4} -
3 NVDS
ICCV GitHub Stars
0.364 {4} - 0.364 {4}

Back to Top Back to the List of Rankings

Direct comparison of 9 metric depth models (each with each) on 5 datasets: F-score

📝 Note: This ranking is based on data from Table 4. The example result 3:0:2 (first left in the first row) means that Depth Pro has a better F-score than UniDepth-V in 3 datasets, in no dataset has the same F-score as UniDepth-V and has a worse F-score compared to UniDepth-V in 2 datasets.

RK Model
Links:
         Venue   Repository    
DP UD M3D v2 DA V2 DA ZoeD M3D PF ZD
1 Depth Pro
arXiv GitHub Stars
- 3:0:2 3:1:1 5:0:0 5:0:0 5:0:0 5:0:0 5:0:0 3:0:0
2 UniDepth-V
CVPR GitHub Stars
2:0:3 - 4:0:1 5:0:0 5:0:0 5:0:0 5:0:0 5:0:0 3:0:0
3 Metric3D v2 ViT-giant
TPAMI GitHub Stars
1:1:3 1:0:4 - 4:1:0 5:0:0 5:0:0 5:0:0 5:0:0 3:0:0
4 Depth Anything V2
NeurIPS GitHub Stars
0:0:5 0:0:5 0:1:4 - 4:1:0 4:0:1 5:0:0 4:0:1 3:0:0
5 Depth Anything
CVPR GitHub Stars
0:0:5 0:0:5 0:0:5 0:1:4 - 3:0:2 3:1:1 3:0:2 2:1:0
6 ZoeD-M12-NK
arXiv GitHub Stars
0:0:5 0:0:5 0:0:5 1:0:4 2:0:3 - 3:0:2 3:1:1 2:0:1

Back to Top Back to the List of Rankings

Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.078

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
MonST3R
  AbsRel ↓  
{Input fr.}
arXiv
DC
1 MonST3R
arXiv GitHub Stars
0.063 {MF} -
2 DepthCrafter
arXiv GitHub Stars
0.075 {MF} 0.075 {MF}
3 Depth Anything
CVPR GitHub Stars
- 0.078 {1}

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.045 (relative depth)

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
arXiv
MoGe
  AbsRel ↓  
{Input fr.}
arXiv
BD
   AbsRel ↓   
{Input fr.}
TPAMI
M3D v2
  AbsRel ↓  
{Input fr.}
CVPR
DA
    AbsRel ↓    
{Input fr.}
NeurIPS
DA V2
- - - -
1 MoGe
arXiv GitHub Stars
0.0341 {1} - - - - - - - -
2 UniDepth
CVPR GitHub Stars
0.0380 {1} - - - - - - - -
3-4 BetterDepth
arXiv
- 0.042 {1} - - - - - - -
3-4 Metric3D v2 ViT-Large
TPAMI GitHub Stars
0.134 {1} - 0.042 {1} - - - - - -
5 Depth Anything Large
CVPR GitHub Stars
0.0424 {1} 0.043 {1} 0.043 {1} 0.043 {1} 0.043 {1} - - - -
6 Depth Anything V2 Large
NeurIPS GitHub Stars
0.0420 {1} - - - 0.045 {1} - - - -

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.051 (metric depth)

RK Model
Links:
         Venue   Repository    
   AbsRel ↓   
{Input fr.}
TPAMI
M3D v2
  AbsRel ↓  
{Input fr.}
arXiv
GRIN
- - - - -
1 Metric3D v2 ViT-giant
TPAMI GitHub Stars
0.045 {1} - - - - - -
2 GRIN_FT_NI
arXiv
- 0.051 {1} - - - - -

Back to Top Back to the List of Rankings

NYU-Depth V2 (640×480): AbsRel<=0.058 (old layout - currently no longer up to date)

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1-2 BetterDepth
arXiv
Backbone:
Depth Anything & Marigold
0.042 {1}
arXiv
Hypersim & Virtual KITTI - - -
1-2 Metric3D v2 CSTM_label
ICCV
ENH:
arXiv
Backbone:
DINOv2 with registers (ViT-L/14)
0.042 {1}
arXiv
DDAD & Lyft & Driving Stereo & DIML & Arogoverse2 & Cityscapes & DSEC & Mapillary PSD & Pandaset & UASOL & Virtual KITTI & Waymo & Matterport3d & Taskonomy & Replica & ScanNet & HM3d & Hypersim GitHub Stars - -
3 Depth Anything Large
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.043 {1}
CVPR
Pretraining: BlendedMVS & DIML & HR-WSI & IRS & MegaDepth & TartanAir
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars - -
4 MiDaS v3.1 BEiTL-512
TPAMI
ENH:
arXiv
Backbone:
BEiT512-L (ViT-L/16)
0.048 {1}
CVPR
Pretraining: ReDWeb & HR-WSI & BlendedMVS & NYU-Depth V2 & KITTI
Training: ReDWeb & DIML & 3D Movies & MegaDepth & WSVD & TartanAir & HR-WSI & ApolloScape & BlendedMVS & IRS & NYU-Depth V2 & KITTI
GitHub Stars - PyTorch
GitHub Stars
5 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
0.052 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
6 Marigold
CVPR
Backbone:
Stable Diffusion v2
0.055 {1}
CVPR
Hypersim & Virtual KITTI GitHub Stars - -
7 GenPercept
arXiv
Backbone:
Stable Diffusion v2.1
0.056 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -
8 NeWCRFs + LightedDepth
CVPR
ENH:
CVPR
0.057 {2}
CVPR
ENH:
NYU-Depth V2
GitHub Stars
ENH:
GitHub Stars
- -
9 UniDepth-V
CVPR
Backbone:
DINOv2 (ViT-L/14)
0.0578 {1}
CVPR
A2D2 & Argoverse2 & BDD100k & CityScapes & DrivingStereo & Mapillary PSD & ScanNet & Taskonomy & Waymo GitHub Stars - -

Back to Top Back to the List of Rankings

DA-2K (mostly 1500×2000): Acc (%)>=86 (old layout - currently no longer up to date)

RK     Model      Acc (%) ↑ 
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 Depth Anything V2 Giant
CVPR
ENH:
arXiv
Backbone:
DINOv2 (ViT-G/14)
97.4 {1}
arXiv
Pretraining: BlendedMVS & Hypersim & IRS & TartanAir & VKITTI 2
Training: BDD100K & Google Landmarks & ImageNet-21K & LSUN & Objects365 & Open Images V7 & Places365 & SA-1B
GitHub Stars
ENH:
GitHub Stars
- -
2 GeoWizard
arXiv
Backbone:
Stable Diffusion v2
88.1 {1}
arXiv
Hypersim & Replica & 3D Ken Burns & Objaverse & proprietary GitHub Stars - -
3 Marigold
CVPR
Backbone:
Stable Diffusion v2
86.8 {1}
arXiv
Hypersim & Virtual KITTI GitHub Stars - -

Back to Top Back to the List of Rankings

UnrealStereo4K (3840×2160): AbsRel<=0.04 (old layout - currently no longer up to date)

RK     Model       AbsRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
Vapour-
Synth
1 ZoeDepth +PFR=128
arXiv
ENH:
CVPR
0.0388 {1}
CVPR
ENH:
UnrealStereo4K
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

Middlebury2021 (1920×1080): SqRel<=0.5 (old layout - currently no longer up to date)

RK     Model       SqRel ↓  
{Input fr.}
Training
dataset
Official
  repository  
Practical
model
VapourSynth
1 LeReS-GBDMF
CVPR
ENH:
AAAI
0.444 {1}
AAAI
ENH:
HR-WSI
GitHub Stars
ENH:
GitHub Stars
- -

Back to Top Back to the List of Rankings

Appendix 3: List of all research papers from the above rankings

Method Paper     Venue    
4DS Scaling 4D Representations arXiv
BetterDepth BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation arXiv
ChronoDepth Learning Temporally Consistent Video Depth from Video Diffusion Priors arXiv
Deep3D Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks ECCV
Depth Any Video Depth Any Video with Scalable Synthetic Data arXiv
Depth Anything Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR
Depth Anything V2 Depth Anything V2 NeurIPS
Depth Pro Depth Pro: Sharp Monocular Metric Depth in Less Than a Second arXiv
DepthCrafter DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos arXiv
DINOv2 DINOv2: Learning Robust Visual Features without Supervision TMLR
FutureDepth FutureDepth: Learning to Predict the Future Improves Video Depth Estimation ECCV
GBDMF Multi-Resolution Monocular Depth Map Fusion by Self-Supervised Gradient-Based Composition AAAI
GenPercept Diffusion Models Trained with Large Data Are Transferable Visual Models arXiv
GeoWizard GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image arXiv
GRIN GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion arXiv
LeReS Learning to Recover 3D Scene Shape from a Single Image CVPR
LightedDepth LightedDepth: Video Depth Estimation in light of Limited Inference View Angles CVPR
Marigold Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation CVPR
Metric3D Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image ICCV
Metric3D v2 Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation TPAMI
MiDaS Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer TPAMI
MiDaS v3.1 MiDaS v3.1 – A Model Zoo for Robust Monocular Relative Depth Estimation arXiv
MoGe MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision arXiv
MonST3R MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion arXiv
NeWCRFs Neural Window Fully-connected CRFs for Monocular Depth Estimation CVPR
NVDS Neural Video Depth Stabilizer ICCV
NVDS+ NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation TPAMI
PatchFusion PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation CVPR
StereoCrafter StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos arXiv
UniDepth UniDepth: Universal Monocular Metric Depth Estimation CVPR
ZoeDepth ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth arXiv

Back to Top Back to the List of Rankings

About

4DS (Scaling 4D Representations) BetterDepth ChronoDepth Depth Any Video Depth Anything Depth Pro DepthCrafter DINOv2 FutureDepth GenPercept GeoWizard LeReS LightedDepth Marigold Metric3D MiDaS MoGe MonST3R NeWCRFs NVDS NVDS+ PatchFusion StereoCrafter UniDepth ZoeDepth | Align3R Buffer Anytime FiffDepth ImmersePro MegaSaM RollingDepth SpatialMe

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published