Skip to content

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)

License

Notifications You must be signed in to change notification settings

joakimjohnander/SipMask

 
 

Repository files navigation

SipMask

This is the official implementation of "SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)" built on the open-source mmdetection and maskrcnn-benchmark.

  • Single-stage method for both image and video instance segmentation.
  • Two different versions are provided: high-accuracy version and real-time (fast) version.
  • Image instance segmentation is built on both mmdetection and maskrcnn-benchmark.
  • Video instance segmentation is built on mmdetection.
  • Datasets: MS COCO for image instance segmentation and YouTube-VIS for video instance segmentation.

Introduction

Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating the mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection.

SipMask-benchmark (image instance segmentation)

  • This project is built on the official implementation of FCOS, which is based on maskrcnn-benchmark.
  • High-quality version is provided.
  • Please use SipMask-benchmark and refer to INSTALL.md for installation.
  • PyTorch1.1.0 and cuda9.0/10.0 are used by me.
Train with multiple GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=$((RANDOM+10000)) tools/train_net.py --config-file ${CONFIG_FILE} DATALOADER.NUM_WORKERS 2 OUTPUT_DIR ${OUTPUT_PATH}
e.g.,
python -m torch.distributed.launch --nproc_per_node=4 --master_port=$((RANDOM+10000)) tools/train_net.py --config-file configs/sipmask/sipmask_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/sipmask_R_50_FPN_1x
Test with a single GPU
python tools/test_net.py --config-file ${CONFIG_FILE} MODEL.WEIGHT ${CHECKPOINT_FILE} TEST.IMS_PER_BATCH 4
e.g.,
python tools/test_net.py --config-file configs/sipmask/sipmask_R_50_FPN_1x.yaml MODEL.WEIGHT  training_dir/SipMask_R50_1x.pth TEST.IMS_PER_BATCH 4 
Results
name backbone input size iteration ms-train val. box AP val. mask AP download
SipMask R50 800 × 1333 1x no 39.5 34.2 model
SipMask R101 800 × 1333 3x yes 44.1 37.8 model

SipMask-mmdetection (image instance segmentation)

  • This project is built on mmdetection.
  • High-quality version and real-time version are both provided.
  • Please use SipMask-mmdetection and refer to INSTALL.md for installation.
  • PyTorch1.1.0, cuda9.0/10.0, and mmcv0.4.3 are used by me.
Train with multiple GPUs
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
e.g.,
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py 4 --validate
Test with a single GPU
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]
e.g., 
python tools/test.py ./configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py ./work_dirs/sipmask_r50_caffe_1x.pth --out results.pkl --eval bbox segm
Results
name backbone input size iter. ms-train GN val. box AP val. mask AP download
SipMask R50 800×1333 1x no yes 38.2 33.5 model
SipMask R50 800×1333 2x yes yes 40.8 35.6 model
SipMask R101 800×1333 4x yes yes 43.6 37.8 model
SipMask R50 544×544 6x yes no 36.0 31.7 model
SipMask R50 544×544 10x yes yes 37.1 32.4 model
SipMask R101 544×544 6x yes no 38.4 33.6 model
SipMask R101 544×544 10x yes yes 40.3 34.8 model
SipMask++ R101-D 544×544 6x yes no 40.1 35.2 model
SipMask++ R101-D 544×544 10x yes yes 41.3 36.1 model
  • GN indicates group normalization used in prediction branch.
  • Model with the input size of 800×1333 fcoses on high accuracy, which is trained in RetinaNet style.
  • Model with the input size of 544×544 fcoses on fast speed, which is trained in SSD style.
  • ++ indicates adding deformable convolutions with interval of 3 in backbone and mask re-scoring module.

SipMask-VIS (video instance segmentation)

  • This project is an implementation for video instance segmenation based on mmdetection.
  • Please use SipMask-VIS and refer to INSTALL.md for installation.
  • PyTorch1.1.0, cuda9.0/10.0, and mmcv0.2.1 are used by me.

Please note that, to run YouTube-VIS dataset like MaskTrackRCNN, install the cocoapi for youtube-vis instead of installing the original cocoapi for coco as follows.

pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"
or
cd SipMask-VIS/pycocotools/cocoapi/PythonAPI
python setup.py build_ext install
Train with multiple GPUs
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}
e.g.,
CUDA_VISIBLE_DEVICES=0,1,2,3 ./toools/dist_train.sh ./configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py 4
Test with a single GPU
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm
e.g.,
python ./tools/test_video.py configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py ./work_dirs/sipmask_r50_fpn_1x.pth --out results.pkl --eval segm

If you want to save the results of video instance segmentation, please use the following command:

python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm --show --save_path= ${SAVE_PATH}
  • CONFIG_FILE of SipMask-VIS is under the folder of SipMask-VIS/configs/sipmask.
  • The model pretrained on MS COCO dataset is used for weight initialization.
Results
name backbone input size iteration ms-train val. mask AP download
SipMask R50 360 × 640 1x no 32.5 model
SipMask R50 360 × 640 1x yes 33.7 model
  • The generated results on YouTube-VIS should be uploaded to codalab for evaluation.

Citation

If the project helps your research, please cite this paper.

@article{Cao_SipMask_ECCV_2020,
  author =       {Jiale Cao and Rao Muhammad Anwer and Hisham Cholakkal and Fahad Shahbaz Khan and Yanwei Pang and Ling Shao},
  title =        {SipMask: Spatial Information Preservation for Fast Instance Segmentation},
  journal =      {Proc. European Conference on Computer Vision},
  year =         {2020}
}

Acknowledgement

Many thanks to the open source codes, i.e., FCOS, mmdetection, YOLACT, and MaskTrack RCNN.

About

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 53.7%
  • Python 39.9%
  • Cuda 4.3%
  • C++ 2.1%
  • Dockerfile 0.0%
  • Shell 0.0%