SegLLM: Multi-round Reasoning Segmentation

We present SegLLM, a novel multi-round interactive segmentation model that leverages conversational memory of both visual and textual outputs to reason over previously segmented objects and past interactions, effectively interpreting complex user intentions.

SegLLM: Multi-round Reasoning Segmentation
XuDong Wang*, Shaolun Zhang*, Shufan Li*, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
UC Berkeley, UCLA, Panasonic AI Research, Stanford
ICLR 2025

[project page] [arxiv] [bibtex] [Huggingface]

Updates

01/22/2025 SegLLM was accepted by ICLR 2025!!!
12/29/2024 Release model training codes and datasets.
11/05/2024 Release model evaluation codes.
11/03/2024 Initial commit: release model inference codes and Gradio demo.

Installation and Dataset

See installation instructions and dataset setup instructions.

Inference

Launch the Gradio demo:

CUDA_VISIBLE_DEVICES=0 ./scripts/inference/launch_gradio_demo.sh

Launch inference via command line:

CUDA_VISIBLE_DEVICES=0 ./scripts/inference/launch_cli_demo.sh

Consider trying the example images and conversations in inference_images.

Evaluation

To evaluate on the following datasets, respectively: multi-round RefCOCO, single-round RefCOCO, single-round RefCOCO with different question templates, multi-round PACO and ReasonSeg:

LOCAL_HOST=0 ./scripts/eval/eval_mr_refcoco.sh
LOCAL_HOST=0 ./scripts/eval/eval_refcoco.sh
LOCAL_HOST=0 ./scripts/eval/eval_refcoco_templates.sh
LOCAL_HOST=0 ./scripts/eval/eval_mr_paco.sh
LOCAL_HOST=0 ./scripts/eval/eval_reason_seg.sh

Training

To reproduce our MR-RefCOCO checkpoint, MR-PACO checkpoint, and all-datasets checkpoint, respectively, run the following commands:

LOCAL_HOST=0,1,2,3 ./scripts/train/train_mr_refcoco.sh
LOCAL_HOST=0,1,2,3 ./scripts/train/train_mr_paco.sh
LOCAL_HOST=0,1,2,3 ./scripts/train/train_all_data_mix.sh

Checkpoints

The model checkpoints are available at Huggingface

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@article{wang2024segllm,
  title={SegLLM: Multi-round Reasoning Segmentation},
  author={Wang, XuDong and Zhang, Shaolun and Li, Shufan and Kallidromitis, Konstantinos and Li, Kehan and Kato, Yusuke and Kozuka, Kazuki and Darrell, Trevor},
  journal={arXiv preprint arXiv:2410.18923},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
assets		assets
inference_images		inference_images
llava		llava
scripts		scripts
uninext-segm		uninext-segm
.gitignore		.gitignore
DATASET.md		DATASET.md
INSTALL.md		INSTALL.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SegLLM: Multi-round Reasoning Segmentation

Updates

Installation and Dataset

Inference

Evaluation

Training

Checkpoints

Citation

About

Releases

Packages

Contributors 4

Languages

berkeley-hipie/segllm

Folders and files

Latest commit

History

Repository files navigation

SegLLM: Multi-round Reasoning Segmentation

Updates

Installation and Dataset

Inference

Evaluation

Training

Checkpoints

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages