This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation. It is based on mmsegmentaion.
- † denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.
- Pre-trained models can be downloaded from Swin Transformer for ImageNet Classification.
- Access code for
baidu
isswin
.
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 512x512 | 42.11 | 43.16 | 35M | config | TBD |
SeMask-S FPN | SeMask Swin-S | 512x512 | 45.92 | 47.63 | 56M | config | checkpoint |
SeMask-B FPN | SeMask Swin-B† | 512x512 | 49.35 | 50.98 | 96M | config | checkpoint |
SeMask-L FPN | SeMask Swin-L† | 640x640 | 51.89 | 53.52 | 211M | config | checkpoint |
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 768x768 | 74.92 | 76.56 | 34M | config | checkpoint |
SeMask-S FPN | SeMask Swin-S | 768x768 | 77.13 | 79.14 | 56M | config | checkpoint |
SeMask-B FPN | SeMask Swin-B† | 768x768 | 77.70 | 79.73 | 96M | config | checkpoint |
SeMask-L FPN | SeMask Swin-L† | 768x768 | 78.53 | 80.39 | 211M | config | checkpoint |
Method | Backbone | Crop Size | mIoU | mIoU (ms+flip) | #params | config | Checkpoint |
---|---|---|---|---|---|---|---|
SeMask-T FPN | SeMask Swin-T | 512x512 | 37.53 | 38.88 | 35M | config | checkpoint |
SeMask-S FPN | SeMask Swin-S | 512x512 | 40.72 | 42.27 | 56M | config | checkpoint |
SeMask-B FPN | SeMask Swin-B† | 512x512 | 44.63 | 46.30 | 96M | config | checkpoint |
SeMask-L FPN | SeMask Swin-L† | 640x640 | 47.47 | 48.54 | 211M | config | checkpoint |
- We developed the codebase using Pytorch v1.8.0 and python 3.7.
- Please refer to get_started.md for installation and dataset_prepare.md for dataset preparation.
- Note: Change the paths according to the dataset location in the dataset config files.
# single-gpu testing
python tools/test.py <CONFIG_FILE> <SEG_CHECKPOINT_FILE> --eval mIoU
# multi-gpu testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --eval mIoU
# multi-gpu, multi-scale testing
tools/dist_test.sh <CONFIG_FILE> <SEG_CHECKPOINT_FILE> <GPU_NUM> --aug-test --eval mIoU
To train with pre-trained models, run:
# single-gpu training
python tools/train.py <CONFIG_FILE> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
# multi-gpu training
tools/dist_train.sh <CONFIG_FILE> <GPU_NUM> --options model.pretrained=<PRETRAIN_MODEL> [model.backbone.use_checkpoint=True] [other optional arguments]
For example, to train an Semantic-FPN model with a SeMask Swin-T
backbone and 8 gpus, run:
tools/dist_train.sh configs/semask_swin/cityscapes/semfpn_semask_swin_tiny_patch4_window7_768x768_80k_cityscapes.py 8 --options model.pretrained=<PRETRAIN_MODEL>
Notes:
use_checkpoint
is used to save GPU memory. Please refer to this page for more details.- The default learning rate and training schedule are as follows:
ADE20K
: 2 GPUs and 8 imgs/gpu. For Large variant, we use 4 GPUs with 4 imgs/gpu.Cityscapes
: 2 GPUs and 4 imgs/gpu. For Large variant, we use 4 GPUs with 2 imgs/gpu.COCO-Stuff 10k
: 4 GPUs and 4 imgs/gpu. For Base and Large variant, we use 8 GPUs with 2 imgs/gpu.
To save the predictions, run the following command:
python tools/test.py <CONFIG_FILE> <SEG_CHECKPOINT_FILE> --eval mIoU --show-dir visuals
@article{jain2021semask,
title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
journal={arXiv},
year={2021}
}