Skip to content

ml-research/deictic-segment-anything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

*Refactoring is undegoing.

DeiSAM: Segment Anything with Deictic Prompting (NeurIPS 2024)

Hikaru Shindo, Manuel Brack, Gopika Sudhakaran, Devendra Singh Dhami, Patrick Schramowski, Kristian Kersting

AI/ML Lab @ TU Darmstadt

We propose DeiSAM, which integrates large pre-trained neural networks with differentiable logic reasoners. Given a complex, textual segmentation description, DeiSAM leverages Large Language Models (LLMs) to generate first-order logic rules and performs differentiable forward reasoning on generated scene graphs.

Install

Dockerfile is avaialbe in the .devcontainer folder.

To install further dependencies, clone Grounded-Segment-Anything and then:

cd neumann/
pip install -e .
cd ../Grounded-Segment-Anything/
cd segment_anything
pip install -e .
cd ../GroundingDINO
pip install -e .

If an error appears regarding OpenCV (circular import), try:

pip uninstall opencv-python
pip uninstall opencv-contrib-python
pip uninstall opencv-contrib-python-headless
pip3 install opencv-contrib-python==4.5.5.62

Download vit model

wget https://huggingface.co/spaces/abhishek/StableSAM/resolve/main/sam_vit_h_4b8939.pth

Dataset

DeiVG datasets can be downloaded here link. Please locate downloaded files to data/ as follows (make sure you are in the home folder of this project):

mkdir data/
cd data
wget https://hessenbox.tu-darmstadt.de/dl/fiJwsDNjdY9HDrUMf3btjoHG/.dir -O deivg.zip
unzip deivg.zip
cd visual_genome
unzip by-id.zip

Please download Visual Genome images link, and locate downloaded files to data/visual_genome/ as follows:

cd data/visual_genome
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip
wget https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip
unzip images.zip
unzip iamges2.zip
mv VG_100K_2/* VG_100K/

Experiments

To solve DeiVG using DeiSAM:

python src/solve_deivg.py --api-key YOUR_OPENAI_API_KEY -c 1
python src/solve_deivg.py --api-key YOUR_OPENAI_API_KEY -c 2
python src/solve_deivg.py --api-key YOUR_OPENAI_API_KEY -c 3

The demonstration of learning can be performed by:

python src/learning_demo.py --api-key YOUR_OPENAI_API_KEY -c 1 -sgg VETO -su
python src/learning_demo.py --api-key YOUR_OPENAI_API_KEY -c 2 -sgg VETO -su

Note that DeiSAM is esseitially a training-free model. Learning here is a demonstration of the learning capability by gradients. The best performance will be always achieved by using the model with ground-truth scene graphs, which corresponds to solve_deivg.py. In other words, DeiSAM doesn't need to be trained when the scene graphs are availale. A future plan is to mitigate the case where scene graphs are not available.

Bibtex

@inproceedings{shindo24deisam,
  author       = {Hikaru Shindo and
                  Manuel Brack and
                  Gopika Sudhakaran and
                  Devendra Singh Dhami and
                  Patrick Schramowski and
                  Kristian Kersting},
  title        = {DeiSAM: Segment Anything with Deictic Prompting},
  booktitle    = {Proceedings of the Conference on Advances in Neural Information Processing Systems (NeurIPS)},
  year         = {2024},
}

LICENSE

See LICENSE.

About

Segment Anything with Deictic Prompting

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published