This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation. It is based on Mask2Former.
- † denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.
- Pre-trained models can be downloaded following the instructions given under tools.
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-L Mask2Former | SeMask Swin-L† | 640x640 | 56.41 | 57.52 | 222M | config | checkpoint |
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-L Mask2Former | SeMask Swin-L† | 512x1024 | 83.97 | 84.98 | 222M | config | checkpoint |
- We developed the codebase using Pytorch v1.9.0 and python 3.8.
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
- See installation instructions.
See Preparing Datasets for Mask2Former.
See Getting Started with Mask2Former.
@article{jain2021semask,
title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
journal={arXiv},
year={2021}
}