Video:
Prompt:
object_names = "rubber duck. blue box. wooden bowl"
You will need CUDA 12.1 to run this method.
- Clone the repository with submodules:
git clone --recurse-submodules https://github.com/kallol-saha/video_to_transforms.git
- Create the conda environment
conda create -n vid2trans python=3.10
conda activate vid2trans
- Install pytorch 2.5.1 for CUDA 12.1
conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.1 -c pytorch -c nvidia
- Install CoTracker3 and download checkpoints. (From https://github.com/facebookresearch/co-tracker)
cd cotracker3
pip install -e .
cd ../..
cd assets/weights
# download the online (multi window) model
wget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_online.pth
# download the offline (single window) model
wget https://huggingface.co/facebook/cotracker3/resolve/main/scaled_offline.pth
cd ../..
- Install Grounded-SAM-2 and download checkpoints. (From https://github.com/IDEA-Research/Grounded-SAM-2)
cd gsam2
pip install -e .
pip install --no-build-isolation -e grounding_dino # Install grounding dino
cd ../..
cd assets/weights
bash download_sam_ckpts.sh
bash download_gdino_ckpts.sh
- Install requirements.
pip install -r requirements.txt
Run the generate_data.py
file