ViT-WSS3D: A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection [ICCV 2023 Paper]
This is the official repository for the ICCV 2023 paper: A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
Please follow the instructions of MMDetection3D installation (https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc4/docs/en/getting_started.md)
We use KITTI and SUN RGB-D dataset in this project, please follow the following instructions to prepare the dataset.
- Following the
Prepare Dataset
section in https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc4/docs/en/datasets/kitti_det.md#prepare-dataset - Download the data split file:
mkdir ./data/kitti/ && mkdir ./data/kitti/ImageSets
# Download data split
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/test.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/test.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/train.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/train.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/val.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/val.txt
wget -c https://raw.githubusercontent.com/traveller59/second.pytorch/master/second/data/ImageSets/trainval.txt --no-check-certificate --content-disposition -O ./data/kitti/ImageSets/trainval.txt
- Create dataset infos:
- For fully-supervised training:
python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti
- For weakly semi-supervised training:
- For training the point-to-box converter in stage 1, we need to sample a small amount of fully-annotated data. We need the following process because the GT database for GT sampling augmentation needs reprocess to prevent possible data leaking.
- For generating the pseudo labels in stage 2, we need to sample the remaining data.
# for 2% full + 98% weak
python tools/create_data_partial_kitti.py --label_ratio 50
python tools/create_data_partial_kitti.py --label_ratio 50 --generate_label_data
# for 5% full + 95% weak
python tools/create_data_partial_kitti.py --label_ratio 20 --generate_label_data
python tools/create_data_partial_kitti.py --label_ratio 20
# for 10% full + 90% weak
python tools/create_data_partial_kitti.py --label_ratio 10 --generate_label_data
python tools/create_data_partial_kitti.py --label_ratio 10
The folder structure after processing should be as below
kitti/
├── ImageSets
├── kitti_dbinfos_train_1_10.pkl
├── kitti_dbinfos_train_1_20.pkl
├── kitti_dbinfos_train_1_50.pkl
├── kitti_dbinfos_train_19_20.pkl
├── kitti_dbinfos_train_49_50.pkl
├── kitti_dbinfos_train_9_10.pkl
├── kitti_dbinfos_train.pkl
├── kitti_gt_database
├── kitti_infos_test_mono3d.coco.json
├── kitti_infos_test.pkl
├── kitti_infos_train_1_10_mono3d.coco.json
├── kitti_infos_train_1_10.pkl
├── kitti_infos_train_1_20_mono3d.coco.json
├── kitti_infos_train_1_20.pkl
├── kitti_infos_train_1_50_mono3d.coco.json
├── kitti_infos_train_1_50.pkl
├── kitti_infos_train_19_20_mono3d.coco.json
├── kitti_infos_train_19_20.pkl
├── kitti_infos_train_49_50_mono3d.coco.json
├── kitti_infos_train_49_50.pkl
├── kitti_infos_train_9_10_mono3d.coco.json
├── kitti_infos_train_9_10.pkl
├── kitti_infos_train_mono3d.coco.json
├── kitti_infos_train.pkl
├── kitti_infos_trainval_mono3d.coco.json
├── kitti_infos_trainval.pkl
├── kitti_infos_val_mono3d.coco.json
├── kitti_infos_val.pkl
├── testing
└── training
Following the instructions in https://github.com/open-mmlab/mmdetection3d/blob/v1.0.0rc4/docs/en/datasets/sunrgbd_det.md
We use MAE-pretrained weights in this project, please go to https://github.com/facebookresearch/mae/blob/main/FINETUNE.md, then download the ViT-Base and ViT-Large weights, and put them in the root folder of this project (i.e., the same folder of this README).
- Stage 1: training the point-to-box converter
./tools/dist_train.sh configs/vitwss3d/vitwss3d_base_kitti_10p.py 4 0,1,2,3 --work-dir <to_your_work_path>
- Stage 2: inference to get the pseudo GT boxes
./tools/dist_test.sh configs/vitwss3d/vitwss3d_base_kitti_10p.py <stage_1_trained_pth> 2 \
--out results/kitti-3class/results_eval.pkl --format-only --eval-options 'pklfile_prefix=results/kitti-3class/kitti_results_10' 'submission_prefix=results/kitti-3class/kitti_results_10'
-
Prepare the training data for Stage 3
- Change the
label_ratio
,save_path
andpseudo_root
according to your settings inprepare_pseudo_label/conver_pred_to_pseudo.py
and run it:
cd prepare_pseudo_label python conver_pred_to_pseudo.py # merge pseudo labels and real labels
- Create data infos for stage 3 training:
python tools/create_data_partial_kitti.py --label_ratio 10 --generate_full_pseudo_label_data # generate the pkl
- Change the
-
Stage 3: train any detector, following instruction takes PointRCNN as example:
./tools/dist_train.sh ./configs/point_rcnn/point_rcnn_2x8_kitti-3d-3classes_pseudo.py 4 0,1,2,3 --work-dir <to_your_work_path>
- Stage 1: training the point-to-box converter
./tools/dist_train.sh configs/vitwss3d/vitwss3d_small_sunrgbd_20p.py 4 0,1,2,3 --work-dir <to_your_work_path>
- Stage 2: inference to get the pseudo GT boxes
./tools/dist_test.sh configs/vitwss3d/vitwss3d_small_sunrgbd_20p.py <stage_1_trained_pth> 4 \
--out sunrgbd_20p_results/ --format-only --eval-options 'pklfile_prefix=sunrgbd_20p_results/' 'submission_prefix=sunrgbd_20p_results/'
-
Stage 3:
- Prepare raw data in a new folder:
cd data mkdir sunrgbd_80pseudo ln -s <path_to_sunrgbd>/points ./ # soft link the point cloud data mkdir sunrgbd_trainval/ && cd sunrgbd_trainval/ # soft link raw data ln -s <path_to_sunrgbd>/sunrgbd_trainval/calib ./ ln -s <path_to_sunrgbd>/sunrgbd_trainval/depth ./ ln -s <path_to_sunrgbd>/sunrgbd_trainval/image ./ ln -s <path_to_sunrgbd>/sunrgbd_trainval/seg_label ./ ln -s <path_to_sunrgbd>/sunrgbd_trainval/train_data_idx.txt ./ ln -s <path_to_sunrgbd>/sunrgbd_trainval/val_data_idx.txt ./ # soft link the pseudo labels ln -s ../../../sunrgbd_20p_results/txt/ ./label
- Run:
python prepare_pseudo_label/convert_indoor_pred_to_pseudo.py --load_gt_interval 5 --gt_data_root data/sunrgbd --pseudo_txt_root data/sunrgbd_80pseudo/sunrgbd_trainval/label # merge pseudo labels and real labels
- Prepare data infos for stage 3 training:
python tools/create_data.py sunrgbd --root-path ./data/sunrgbd_80pseudo --out-dir ./data/sunrgbd_80pseudo --extra-tag sunrgbd
-
Stage 3: train any detector, following instruction takes VoteNet as example:
./tools/dist_train.sh configs/votenet/votenet_16x8_sunrgbd_80pseudo-3d-10class.py 4 0,1,2,3 --work-dir <to_your_work_path>
@inproceedings{zhang2023simple,
title={A simple vision transformer for weakly semi-supervised 3d object detection},
author={Zhang, Dingyuan and Liang, Dingkang and Zou, Zhikang and Li, Jingyu and Ye, Xiaoqing and Liu, Zhe and Tan, Xiao and Bai, Xiang},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={8373--8383},
year={2023}
}