ViT-WSS3D: A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection [ICCV 2023 Paper]

Introduction

This is the official repository for the ICCV 2023 paper: A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection

Pipeline

Installation

Please follow the instructions of MMDetection3D installation (https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc4/docs/en/getting_started.md)

Data Preparation

We use KITTI and SUN RGB-D dataset in this project, please follow the following instructions to prepare the dataset.

KITTI

Following the Prepare Dataset section in https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc4/docs/en/datasets/kitti_det.md#prepare-dataset
Download the data split file:

mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets

# Download data split
wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c  https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt

Create dataset infos:

For fully-supervised training:

python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti

For weakly semi-supervised training:
- For training the point-to-box converter in stage 1, we need to sample a small amount of fully-annotated data. We need the following process because the GT database for GT sampling augmentation needs reprocess to prevent possible data leaking.
- For generating the pseudo labels in stage 2, we need to sample the remaining data.

# for 2% full + 98% weak
python tools/create_data_partial_kitti.py --label_ratio 50 
python tools/create_data_partial_kitti.py --label_ratio 50 --generate_label_data
# for 5% full + 95% weak
python tools/create_data_partial_kitti.py --label_ratio 20 --generate_label_data
python tools/create_data_partial_kitti.py --label_ratio 20 
# for 10% full + 90% weak
python tools/create_data_partial_kitti.py --label_ratio 10 --generate_label_data
python tools/create_data_partial_kitti.py --label_ratio 10

The folder structure after processing should be as below

kitti/
├── ImageSets
├── kitti_dbinfos_train_1_10.pkl
├── kitti_dbinfos_train_1_20.pkl
├── kitti_dbinfos_train_1_50.pkl
├── kitti_dbinfos_train_19_20.pkl
├── kitti_dbinfos_train_49_50.pkl
├── kitti_dbinfos_train_9_10.pkl
├── kitti_dbinfos_train.pkl
├── kitti_gt_database
├── kitti_infos_test_mono3d.coco.json
├── kitti_infos_test.pkl
├── kitti_infos_train_1_10_mono3d.coco.json
├── kitti_infos_train_1_10.pkl
├── kitti_infos_train_1_20_mono3d.coco.json
├── kitti_infos_train_1_20.pkl
├── kitti_infos_train_1_50_mono3d.coco.json
├── kitti_infos_train_1_50.pkl
├── kitti_infos_train_19_20_mono3d.coco.json
├── kitti_infos_train_19_20.pkl
├── kitti_infos_train_49_50_mono3d.coco.json
├── kitti_infos_train_49_50.pkl
├── kitti_infos_train_9_10_mono3d.coco.json
├── kitti_infos_train_9_10.pkl
├── kitti_infos_train_mono3d.coco.json
├── kitti_infos_train.pkl
├── kitti_infos_trainval_mono3d.coco.json
├── kitti_infos_trainval.pkl
├── kitti_infos_val_mono3d.coco.json
├── kitti_infos_val.pkl
├── testing 
└── training

SUN RGB-D

Following the instructions in https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc4/docs/en/datasets/sunrgbd_det.md

Training

Prepare Pretrained Weights

We use MAE-pretrained weights in this project, please go to https://github.com/facebookresearch/mae/blob/main/FINETUNE.md, then download the ViT-Base and ViT-Large weights, and put them in the root folder of this project (i.e., the same folder of this README).

KITTI

Stage 1: training the point-to-box converter

./tools/dist_train.sh configs/vitwss3d/vitwss3d_base_kitti_10p.py 4 0,1,2,3 --work-dir <to_your_work_path>

Stage 2: inference to get the pseudo GT boxes

./tools/dist_test.sh configs/vitwss3d/vitwss3d_base_kitti_10p.py <stage_1_trained_pth> 2 \
--out results/kitti-3class/results_eval.pkl --format-only --eval-options 'pklfile_prefix=results/kitti-3class/kitti_results_10' 'submission_prefix=results/kitti-3class/kitti_results_10'

Prepare the training data for Stage 3

Change the label_ratio, save_path and pseudo_root according to your settings in prepare_pseudo_label/conver_pred_to_pseudo.py and run it:

cd prepare_pseudo_label
python conver_pred_to_pseudo.py  # merge pseudo labels and real labels

Create data infos for stage 3 training:

python tools/create_data_partial_kitti.py --label_ratio 10 --generate_full_pseudo_label_data # generate the pkl

Stage 3: train any detector, following instruction takes PointRCNN as example:

./tools/dist_train.sh ./configs/point_rcnn/point_rcnn_2x8_kitti-3d-3classes_pseudo.py 4 0,1,2,3 --work-dir <to_your_work_path>

SUN RGB-D

Stage 1: training the point-to-box converter

./tools/dist_train.sh configs/vitwss3d/vitwss3d_small_sunrgbd_20p.py 4 0,1,2,3 --work-dir <to_your_work_path>

Stage 2: inference to get the pseudo GT boxes

./tools/dist_test.sh configs/vitwss3d/vitwss3d_small_sunrgbd_20p.py <stage_1_trained_pth> 4 \
--out sunrgbd_20p_results/ --format-only --eval-options 'pklfile_prefix=sunrgbd_20p_results/' 'submission_prefix=sunrgbd_20p_results/'

Stage 3:

Prepare raw data in a new folder:

cd data
mkdir sunrgbd_80pseudo
ln -s <path_to_sunrgbd>/points ./  # soft link the point cloud data

mkdir sunrgbd_trainval/ && cd sunrgbd_trainval/
# soft link raw data
ln -s <path_to_sunrgbd>/sunrgbd_trainval/calib ./ 
ln -s <path_to_sunrgbd>/sunrgbd_trainval/depth ./
ln -s <path_to_sunrgbd>/sunrgbd_trainval/image ./
ln -s <path_to_sunrgbd>/sunrgbd_trainval/seg_label ./
ln -s <path_to_sunrgbd>/sunrgbd_trainval/train_data_idx.txt ./
ln -s <path_to_sunrgbd>/sunrgbd_trainval/val_data_idx.txt ./

# soft link the pseudo labels
ln -s ../../../sunrgbd_20p_results/txt/ ./label

Run:

python prepare_pseudo_label/convert_indoor_pred_to_pseudo.py --load_gt_interval 5 --gt_data_root data/sunrgbd --pseudo_txt_root data/sunrgbd_80pseudo/sunrgbd_trainval/label   # merge pseudo labels and real labels

Prepare data infos for stage 3 training:

python tools/create_data.py sunrgbd --root-path ./data/sunrgbd_80pseudo --out-dir ./data/sunrgbd_80pseudo --extra-tag sunrgbd

Stage 3: train any detector, following instruction takes VoteNet as example:

./tools/dist_train.sh configs/votenet/votenet_16x8_sunrgbd_80pseudo-3d-10class.py 4 0,1,2,3 --work-dir <to_your_work_path>

Results

KITTI

SUN RGB-D

Citation

@inproceedings{zhang2023simple,
  title={A simple vision transformer for weakly semi-supervised 3d object detection},
  author={Zhang, Dingyuan and Liang, Dingkang and Zou, Zhikang and Li, Jingyu and Ye, Xiaoqing and Liu, Zhe and Tan, Xiao and Bai, Xiang},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={8373--8383},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.dev_scripts		.dev_scripts
configs		configs
figs		figs
mmdet3d		mmdet3d
prepare_pseudo_label		prepare_pseudo_label
pretrain_weights		pretrain_weights
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
model-index.yml		model-index.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT-WSS3D: A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection [ICCV 2023 Paper]

Introduction

Pipeline

Installation

Data Preparation

KITTI

SUN RGB-D

Training

Prepare Pretrained Weights

KITTI

SUN RGB-D

Results

KITTI

SUN RGB-D

Citation

About

Releases

Packages

Languages

License

DYZhang09/ViTWSS3D

Folders and files

Latest commit

History

Repository files navigation

ViT-WSS3D: A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection [ICCV 2023 Paper]

Introduction

Pipeline

Installation

Data Preparation

KITTI

SUN RGB-D

Training

Prepare Pretrained Weights

KITTI

SUN RGB-D

Results

KITTI

SUN RGB-D

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages