Skip to content

Latest commit

 

History

History
71 lines (46 loc) · 2.95 KB

README.md

File metadata and controls

71 lines (46 loc) · 2.95 KB

Open-Vocabulary Universal Image Segmentation with MaskCLIP (ICML 2023)

Zheng Ding, Jieke Wang, Zhuowen Tu

Arxiv / Project / Video

teaser

Data preparation

For COCO and ADE20k data preparation, please refer to Preparing Datasets in Mask2Former.

Environment Setup

Please follow the following codes to set up the environment.

conda create -n maskclip python=3.9
conda activate maskclip
conda install pytorch=1.10 cudatoolkit=11.3 torchvision=0.11 -c pytorch -c conda-forge
python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html
pip install setuptools==59.5.0
pip install timm opencv-python scipy einops
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/cocodataset/panopticapi.git

cd mask2former/modeling/pixel_decoder/ops/
sh make.sh

Training

Training Class-Agnostic Mask Proposal Network

You can train a class-agnostic mask proposal network by removing the classification head of previous segmentation models e.g., Mask2Former, MaskRCNN. We provide our trained class-agnostic mask proposal network here.

Training MaskCLIP on COCO dataset

With the trained class-agnostic mask proposal network, we can train the MaskCLIP model through the following command. We train our model for 10,000 iterations with a batch size of 8.

python train_net.py --num-gpus 8 --config-file configs/coco/maskformer2_R50_bs16_50ep.yaml

Testing MaskCLIP on ADE20K dataset

You can test our model on ADE20K dataset to get the results using the trained model. We also provide our trained model here. You need to change the path of MODEL.WEIGHTS in the yaml file or add to the line

python train_net.py --num-gpus 1 --config-file configs/ade20k/maskformer2_R50_bs16_160k.yaml --eval-only MODEL.WEIGHTS model_final.pth

Citation

If you find this work helpful, please consider citing MaskCLIP using the following BibTeX entry.

@inproceedings{ding2023maskclip,
  author    = {Zheng Ding, Jieke Wang, Zhuowen Tu},
  title     = {Open-Vocabulary Universal Image Segmentation with MaskCLIP},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
}

Please also checkout MasQCLIP for our lastest work on open-vocabulary segmentation.

Acknowledgement

This codebase was built upon and drew inspirations from CLIP and Mask2Former. We thank the authors for making those repositories public.