Skip to content
/ LiVOS Public

LiVOS: Light Video Object Segmentation with Gated Linear Matching

License

Notifications You must be signed in to change notification settings

uncbiag/LiVOS

Repository files navigation

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Pytorch implementation for paper LiVOS: Light Video Object Segmentation with Gated Linear Matching, arXiv 2024.

Qin Liu1, Jianfeng Wang2, Zhengyuan Yang2, Linjie Li2, Kevin Lin2, Marc Niethammer1, Lijuan Wang2
1UNC-Chapel Hill, 2Microsoft

drawing

Installation

The code is tested with python=3.10, torch=2.4.0, torchvision=0.19.0.

git clone https://github.com/uncbiag/LiVOS
cd LiVOS

Create a new conda environment and install required packages accordingly.

conda create -n livos python=3.10
conda activate livos
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Weights

Download the model weights and store them in the ./weights directory. The directory will be automatically created if it does not already exist.

python ./download.py

Datasets

Dataset Description Download Link
DAVIS 2017 60 videos (train); 30 videos (val); 30 videos (test) official site
YouTube VOS 2019 3471 videos (train); 507 videos (val) official site
MOSE 1507 videos (train); 311 videos (val) official site
LVOS (v1)* 50 vidoes (val); 50 videos (test) official site

(*) To prepare LVOS, you need to extract only the first annotations for its validation set:

python scripts/data/preprocess_lvos.py ../LVOS/valid/Annotations ../LVOS/valid/Annotations_first_only

Prepare the datasets in the following structure:

├── LiVOS (codebase)
├── DAVIS
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── YouTube
│   ├── all_frames
│   │   └── valid_all_frames
│   ├── train
│   └── valid
├── LVOS
│   ├── valid
│   │   ├──Annotations
│   │   └── ...
│   └── test
│       ├──Annotations
│       └── ...
└── MOSE
    ├── JPEGImages
    └── Annotations

Evaluation

You should get the following results using our provided models:

Training
Dataset
Model J&F
MOSE DAVIS-17 val DAVIS-17 test YTVOS-19 val LVOS val LVOS test
D17+YT19 livos-nomose-480p (135 MB) 59.2 84.4 78.2 79.9 50.6 44.6
D17+YT19 livos-nomose-ft-480p (135 MB) 58.4 85.1 81.0 81.3 51.2 50.9
D17+YT19+MOSE livos-wmose-480p (135 MB) 64.8 84.0 79.6 82.6 51.2 47.0
  1. To run the evaluation:
python livos/eval.py dataset=[dataset] weights=[path to model file]

Example for DAVIS 2017 validation set (more dataset options in livos/config/eval_config.yaml):

python livos/eval.py dataset=d17-val weights=./weights/livos-nomose-480p.pth
  1. To get quantitative results for DAVIS 2017 validation:
GT_DIR=../DAVIS/2017/trainval/Annotations/480p
Seg_DIR=./results/d17-val/Annotations
python ./vos-benchmark/benchmark.py -g ${GT_DIR} -m ${Seg_DIR}
  1. For results on other datasets,

Training

We conducted the training on four A6000 48GB GPUs. Without MOSE, the process required approximately 90 hours to complete 125,000 iterations.

OMP_NUM_THREADS=4 torchrun --master_port 25350 --nproc_per_node=4 livos/train.py exp_id=first_try model=base data=base
  • The training configuration is located in livos/config/train_config.yaml.
  • By default, the output folder is set to ./model_mmdd_yyyy/${exp_id}. If needed, this can be modified in the training configuration file.

Citation

@article{liu2024livos,
  title={LiVOS: Lite Video Object Segmentation with Gated Linear Matching},
  author={Liu, Qin and Wang, Jianfeng and Yang, Zhengyuan and Li, Linjie and Lin, Kevin and Niethammer, Marc and Wang, lijuan},
  journal={arXiv preprint arXiv:2411.02818},
  year={2024}
}

Acknowledgement

Our project is developed based on Cutie. We appreciate the well-maintained codebase.