LiVOS: Light Video Object Segmentation with Gated Linear Matching

Pytorch implementation for paper LiVOS: Light Video Object Segmentation with Gated Linear Matching, arXiv 2024.

Qin Liu¹, Jianfeng Wang², Zhengyuan Yang², Linjie Li², Kevin Lin², Marc Niethammer¹, Lijuan Wang²
¹UNC-Chapel Hill, ²Microsoft

Paper

Installation

The code is tested with python=3.10, torch=2.4.0, torchvision=0.19.0.

git clone https://github.com/uncbiag/LiVOS
cd LiVOS

Create a new conda environment and install required packages accordingly.

conda create -n livos python=3.10
conda activate livos
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt

Weights

Download the model weights and store them in the ./weights directory. The directory will be automatically created if it does not already exist.

python ./download.py

Datasets

Dataset	Description	Download Link
DAVIS 2017	60 videos (train); 30 videos (val); 30 videos (test)	official site
YouTube VOS 2019	3471 videos (train); 507 videos (val)	official site
MOSE	1507 videos (train); 311 videos (val)	official site
LVOS (v1)*	50 vidoes (val); 50 videos (test)	official site

(*) To prepare LVOS, you need to extract only the first annotations for its validation set:

python scripts/data/preprocess_lvos.py ../LVOS/valid/Annotations ../LVOS/valid/Annotations_first_only

Prepare the datasets in the following structure:

├── LiVOS (codebase)
├── DAVIS
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── YouTube
│   ├── all_frames
│   │   └── valid_all_frames
│   ├── train
│   └── valid
├── LVOS
│   ├── valid
│   │   ├──Annotations
│   │   └── ...
│   └── test
│       ├──Annotations
│       └── ...
└── MOSE
    ├── JPEGImages
    └── Annotations

Evaluation

You should get the following results using our provided models:

Training Dataset	Model	J&F
Training Dataset	Model	MOSE	DAVIS-17 val	DAVIS-17 test	YTVOS-19 val	LVOS val	LVOS test
D17+YT19	livos-nomose-480p (135 MB)	59.2	84.4	78.2	79.9	50.6	44.6
D17+YT19	livos-nomose-ft-480p (135 MB)	58.4	85.1	81.0	81.3	51.2	50.9
D17+YT19+MOSE	livos-wmose-480p (135 MB)	64.8	84.0	79.6	82.6	51.2	47.0

To run the evaluation:

python livos/eval.py dataset=[dataset] weights=[path to model file]

Example for DAVIS 2017 validation set (more dataset options in livos/config/eval_config.yaml):

python livos/eval.py dataset=d17-val weights=./weights/livos-nomose-480p.pth

To get quantitative results for DAVIS 2017 validation:

GT_DIR=../DAVIS/2017/trainval/Annotations/480p
Seg_DIR=./results/d17-val/Annotations
python ./vos-benchmark/benchmark.py -g ${GT_DIR} -m ${Seg_DIR}

For results on other datasets,

DAVIS 2017 test-dev: CodaLab
YouTubeVOS 2019 validation: CodaLab
LVOS val: LVOS
LVOS test: CodaLab
MOSE val: CodaLab

Training

We conducted the training on four A6000 48GB GPUs. Without MOSE, the process required approximately 90 hours to complete 125,000 iterations.

OMP_NUM_THREADS=4 torchrun --master_port 25350 --nproc_per_node=4 livos/train.py exp_id=first_try model=base data=base

The training configuration is located in livos/config/train_config.yaml.
By default, the output folder is set to ./model_mmdd_yyyy/${exp_id}. If needed, this can be modified in the training configuration file.

Citation

@article{liu2024livos,
  title={LiVOS: Lite Video Object Segmentation with Gated Linear Matching},
  author={Liu, Qin and Wang, Jianfeng and Yang, Zhengyuan and Li, Linjie and Lin, Kevin and Niethammer, Marc and Wang, lijuan},
  journal={arXiv preprint arXiv:2411.02818},
  year={2024}
}

Acknowledgement

Our project is developed based on Cutie. We appreciate the well-maintained codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
docs		docs
livos		livos
scripts		scripts
vos-benchmark @ acb4733		vos-benchmark @ acb4733
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
download.py		download.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Paper

Installation

Weights

Datasets

Evaluation

Training

Citation

Acknowledgement

About

Releases

Packages

Languages

License

uncbiag/LiVOS

Folders and files

Latest commit

History

Repository files navigation

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Paper

Installation

Weights

Datasets

Evaluation

Training

Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages