Skip to content
forked from leeyevi/MV3D_TF

Tensorflow implementation of Multi-View 3D Object Detection Network (in progress)

License

Notifications You must be signed in to change notification settings

HandsLing/MV3D_TF

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MV3D_TF(In progress)

This is an experimental Tensorflow implementation of MV3D - a ConvNet for object detection with Lidar and Mono-camera.

For details about MV3D please refer to the paper Multi-View 3D Object Detection Network for Autonomous Driving by Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, Tian Xia.

Requirements: software

  1. Requirements for Tensorflow 1.0 (see: Tensorflow)

  2. Python packages you might not have: cython, python-opencv, easydict

Requirements: hardware

  1. For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)

Installation

  1. Clone the Faster R-CNN repository
  # Make sure to clone with --recursive
  git clone --recursive https://github.com/RyannnG/MV3D_TF.git
  1. Build the Cython modules

     cd $MV3D/lib
     make
  2. Downloads KITTI object datasets.

 % Specify KITTI data path so that the structure is like

 % {kitti_dir}/object/training/image_2
 %                            /image_3
 %                            /calib
 %                            /lidar_bv
 %							 /velodyne
       

 % {kitti_dir}/object/testing/image_2
 %                           /image_3
 %                           /calib
 %                           /lidar_bv
 %							/velodyne
  1. Make Lidar Bird View data

    # edit the kitti_path in tools/read_lidar.py
    # then start make data
    python tools/read_lidar.py
  2. Create symlinks for the KITTI dataset

   cd $MV3D/data/KITTI
   ln -s {kitti_dir}/object object
  1. Download pre-trained ImageNet models

    Download the pre-trained ImageNet models [Google Drive] [Dropbox]

    mv VGG_imagenet.npy $MV3D/data/pretrain_model/VGG_imagenet.npy
  1. Run script to train model
 cd $MV3D
 ./experiments/scripts/mv3d.sh $DEVICE $DEVICE_ID ${.npy/ckpt.meta} kitti_train

DEVICE is either cpu/gpu

Network Structure

Key idea: Use Lidar bird view to generate anchor boxes, then project those boxes on image to do classification.

structure

Examples

Image and corresponding Lidar map

Note:

In image:

  • Boxes without regression

In Lidar:

  • white box: without regression (correspond with image)
  • purple box: with regression

figure_20

figure_20

figure_20

figure_20

figure_20

figure_20

figure_20

figure_20

Existing Errors

Mostly due to regression error

figure_20

(error in box 5,6,9)

figure_20

figure_20

(error in 8, 9, 10)

figure_20

References

Lidar Birds Eye Views

part.2: Didi Udacity Challenge 2017 — Car and pedestrian Detection using Lidar and RGB

Faster_RCNN_TF

Faster R-CNN caffe version

TFFRCNN

About

Tensorflow implementation of Multi-View 3D Object Detection Network (in progress)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 86.6%
  • Python 12.4%
  • Other 1.0%