Bin Wang, Armstrong Aboah, Zheyuan Zhang, Ulas Bagci
[Paper
] [Demo (Youtube)
] [Demo (bilibili)
] [BibTeX
]
The GazeSAM is a human-computer interaction system that combines eye tracking technology with Segment Anything Model (SAM), and it enables users to segment object they are looking at in real-time. This system is designed specifically for the radiologist to record segmentation mask during image reading by simply looking at the desired regions, which can boost the daily clinical workflow. Besides, eye-gaze data can also be easily recorded for the further eye-tracking research. This system supports both 2D and 3D images.
A user interface is provided as shown in the left image below and experiment setting is illustrated as the right image below.
This code requires requires python=3.8.0
, as well as pytorch>=1.7
and torchvision>=0.8
. For your convenience, we already install the package of segment-anything in our repo but you still need to install some dependencies.
pip install opencv-python pycocotools matplotlib onnxruntime onnx
Besides, you need to download the model checkpoint provided by SAM as follows
default
orvit_h
: ViT-H SAM model.vit_l
: ViT-L SAM model.vit_b
: ViT-B SAM model.
And put the checkpoint under "./model/".
In this work, we use Tobii Pro Nano as the eye tracker. If you have the device and want to repeat our result. Here is some steps you need to do.
First, download Tobii Pro Eye Tracker Manager, then open the open it to install one Tobii Pro Nano device in your PC. After that, finish the calibration procedure to make sure the eye movement is recorded accurately.
Second, install some dependencies
pip install tobii-research, PyQt5, SimpleITK
Run the user interface by
python ui.py
Here is few examples of GazeSAM.
@article{wang2023gazesam,
title={GazeSAM: What You See is What You Segment},
author={Wang, Bin and Aboah, Armstrong and Zhang, Zheyuan and Bagci, Ulas},
journal={arXiv preprint arXiv:2304.13844},
year={2023}
}