Skip to content

Code release for "SegLLM: Multi-round Reasoning Segmentation"

Notifications You must be signed in to change notification settings

berkeley-hipie/segllm

Repository files navigation

SegLLM: Multi-round Reasoning Segmentation

We present SegLLM, a novel multi-round interactive segmentation model that leverages conversational memory of both visual and textual outputs to reason over previously segmented objects and past interactions, effectively interpreting complex user intentions.

demo

SegLLM: Multi-round Reasoning Segmentation
XuDong Wang*, Shaolun Zhang*, Shufan Li*, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
UC Berkeley, UCLA, Panasonic AI Research, Stanford
ICLR 2025

[project page] [arxiv] [bibtex] [Huggingface]

Updates

  • 01/22/2025 SegLLM was accepted by ICLR 2025!!!
  • 12/29/2024 Release model training codes and datasets.
  • 11/05/2024 Release model evaluation codes.
  • 11/03/2024 Initial commit: release model inference codes and Gradio demo.

Installation and Dataset

See installation instructions and dataset setup instructions.

Inference

pipeline

Launch the Gradio demo:

CUDA_VISIBLE_DEVICES=0 ./scripts/inference/launch_gradio_demo.sh

Launch inference via command line:

CUDA_VISIBLE_DEVICES=0 ./scripts/inference/launch_cli_demo.sh

Consider trying the example images and conversations in inference_images.

Evaluation

demo

To evaluate on the following datasets, respectively: multi-round RefCOCO, single-round RefCOCO, single-round RefCOCO with different question templates, multi-round PACO and ReasonSeg:

LOCAL_HOST=0 ./scripts/eval/eval_mr_refcoco.sh
LOCAL_HOST=0 ./scripts/eval/eval_refcoco.sh
LOCAL_HOST=0 ./scripts/eval/eval_refcoco_templates.sh
LOCAL_HOST=0 ./scripts/eval/eval_mr_paco.sh
LOCAL_HOST=0 ./scripts/eval/eval_reason_seg.sh

Training

To reproduce our MR-RefCOCO checkpoint, MR-PACO checkpoint, and all-datasets checkpoint, respectively, run the following commands:

LOCAL_HOST=0,1,2,3 ./scripts/train/train_mr_refcoco.sh
LOCAL_HOST=0,1,2,3 ./scripts/train/train_mr_paco.sh
LOCAL_HOST=0,1,2,3 ./scripts/train/train_all_data_mix.sh

Checkpoints

The model checkpoints are available at Huggingface

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@article{wang2024segllm,
  title={SegLLM: Multi-round Reasoning Segmentation},
  author={Wang, XuDong and Zhang, Shaolun and Li, Shufan and Kallidromitis, Konstantinos and Li, Kehan and Kato, Yusuke and Kozuka, Kazuki and Darrell, Trevor},
  journal={arXiv preprint arXiv:2410.18923},
  year={2024}
}

About

Code release for "SegLLM: Multi-round Reasoning Segmentation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •