Skip to content

Latest commit

 

History

History
30 lines (20 loc) · 1.88 KB

vehicle_centric_velocity_net.md

File metadata and controls

30 lines (20 loc) · 1.88 KB

July 2020

tl;dr: Distance and velocity estimation from monocular video.

Overall impression

Achieves better performance and is more end to end than monocular_velocity. It uses optical flow and RoIAligned features to regress velocity and distance. It does not use off-the-shelf depth estimator as in monocular_velocity.

3D velocity estimation can be seen as the prediction of sparse scene flow. This is to be compared to the 2d offset prediction in CenterTrack, which can be seen as a sparse optical flow. Scene flow = optical flow + depth.

SOTA velocity estimation is about 0.48 m/s.

Key ideas

  • Input: two stacked images.
  • Main idea: if we know the two corresponding point and their depth in two neighboring frames, then we can calculate the velocity of that point.
  • Uses PWCNet encoder as feature extractor for feature F.
  • distance: feature vector F from RoIAligned current frame + geometry vectors (intrinsics + bbox)
    • Vehicle centric or not, does not matter much
  • velocity: feature vector F + optical flow vector M RoIAligned from two neighboring frames + geometry vectors (intrinsics + bbox)
    • Velocity estimation needs to be vehicle centric as optical flow works much better on image patches than on the whole image.

Technical details

  • It regresses the closest point to the vehicle, and uses bbox center as the proxy. This could be problematic for side distance estimation.
  • The supervised DORN performance is about the same as vehicle centric distance estimation. Self-supervised method is much worse. This is somewhat surprising.

Notes

  • Questions and notes on how to improve/revise the current work