We present SegLLM, a novel multi-round interactive segmentation model that leverages conversational memory of both visual and textual outputs to reason over previously segmented objects and past interactions, effectively interpreting complex user intentions.
SegLLM: Multi-round Reasoning Segmentation
XuDong Wang*, Shaolun Zhang*, Shufan Li*, Konstantinos Kallidromitis, Kehan Li, Yusuke Kato, Kazuki Kozuka, Trevor Darrell
UC Berkeley, UCLA, Panasonic AI Research, Stanford
ICLR 2025
[project page
] [arxiv
] [bibtex
] [Huggingface
]
- 01/22/2025 SegLLM was accepted by ICLR 2025!!!
- 12/29/2024 Release model training codes and datasets.
- 11/05/2024 Release model evaluation codes.
- 11/03/2024 Initial commit: release model inference codes and Gradio demo.
See installation instructions and dataset setup instructions.
Launch the Gradio demo:
CUDA_VISIBLE_DEVICES=0 ./scripts/inference/launch_gradio_demo.sh
Launch inference via command line:
CUDA_VISIBLE_DEVICES=0 ./scripts/inference/launch_cli_demo.sh
Consider trying the example images and conversations in inference_images
.
To evaluate on the following datasets, respectively: multi-round RefCOCO, single-round RefCOCO, single-round RefCOCO with different question templates, multi-round PACO and ReasonSeg:
LOCAL_HOST=0 ./scripts/eval/eval_mr_refcoco.sh
LOCAL_HOST=0 ./scripts/eval/eval_refcoco.sh
LOCAL_HOST=0 ./scripts/eval/eval_refcoco_templates.sh
LOCAL_HOST=0 ./scripts/eval/eval_mr_paco.sh
LOCAL_HOST=0 ./scripts/eval/eval_reason_seg.sh
To reproduce our MR-RefCOCO checkpoint, MR-PACO checkpoint, and all-datasets checkpoint, respectively, run the following commands:
LOCAL_HOST=0,1,2,3 ./scripts/train/train_mr_refcoco.sh
LOCAL_HOST=0,1,2,3 ./scripts/train/train_mr_paco.sh
LOCAL_HOST=0,1,2,3 ./scripts/train/train_all_data_mix.sh
The model checkpoints are available at Huggingface
If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.
@article{wang2024segllm,
title={SegLLM: Multi-round Reasoning Segmentation},
author={Wang, XuDong and Zhang, Shaolun and Li, Shufan and Kallidromitis, Konstantinos and Li, Kehan and Kato, Yusuke and Kozuka, Kazuki and Darrell, Trevor},
journal={arXiv preprint arXiv:2410.18923},
year={2024}
}