A toolkit for robotic tasks. This toolkit is compiled by Jishnu Jaykumar P.
- Zero-shot classification using OpenAI CLIP.
- Zero-shot text-to-bbox approach for object detection using GroundingDINO.
- Zero-shot bbox-to-mask approach for object detection using SegmentAnything (MobileSAM).
- Zero-shot image-to-depth approach for depth estimation using Depth Anything.
- Zero-shot feature upsampling using FeatUp.
- Python 3.9
- torch (tested 2.0)
- torchvision
Before installing, set the CUDA_HOME path. Make sure to replace your cuda path below.
export CUDA_HOME=/use/local/cuda
pip install -r requirements.txt
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
python setup.py install
Note: Check GroundingDINO installation for the following error
NameError: name '_C' is not defined
- Note: All test scripts are located in the
test
directory. Place the respective test scripts in the root directory to run. - SAM:
test_sam.py
- GroundingDINO + SAM:
test_gdino_sam.py
- GroundingDINO + SAM + CLIP:
test_gdino_sam_clip.py
- Depth Anything:
test_depth_anything.py
- FeatUp:
test_featup.py
- Test Datasets:
test_dataset.py
python test_dataset.py --gpu 0 --dataset <ocid_object_test/osd_object_test>
Input Image |
Segmented Image |
Future goals for this project include:
- Add a config to set the pretrained checkpoints dynamically
- More: TODO
This project is based on the following repositories (license check mandatory):
This project is licensed under the MIT License. However, before using this tool please check the respective works for specific licenses.