Skip to content

Latest commit

 

History

History
69 lines (44 loc) · 3.93 KB

README.md

File metadata and controls

69 lines (44 loc) · 3.93 KB

Zero-Forge: Feedforward Text-to-Shape Without 3D Supervision

by Kelly Marshall, Minh Pham, Ameya Joshi, Anushrut Jignasu, Aditya Balu, Adarsh Krishnamurthy and Chinmay Hegde

CLIP

Current state-of-the-art methods for text-to-shape generation either require supervised training using a labeled dataset of pre-defined 3D shapes, or perform expensive inference-time optimization of implicit neural representations. In this work, we present ZeroForge, an approach for zero-shot text-to-shape generation that avoids both pitfalls. To achieve open-vocabulary shape generation, we require careful architectural adaptation of existing feed-forward approaches, as well as a combination of data-free CLIP-loss and contrastive losses to avoid mode collapse. Using these techniques, we are able to considerably expand the generative ability of existing feed-forward text-to-shape models such as CLIP-Forge. We support our method via extensive qualitative and quantitative evaluations.

Paper Link: Paper

Project Website: Website

If you find our code or paper useful, you can cite at:

  @misc{marshall2023zeroforge,  
    title={ZeroForge: Feedforward Text-to-Shape Without 3D Supervision},  
    author={Kelly O. Marshall and Minh Pham and Ameya Joshi and Anushrut Jignasu  
    and Aditya Balu and Adarsh Krishnamurthy and Chinmay Hegde},  
    year={2023},  
    eprint={2306.08183},  
    archivePrefix={arXiv},  
    primaryClass={cs.CV}  
  }

Installation

Our code is an extension of the CLIP-Forge repo as our method uses their trained model as an initialization. After cloning the repo, you can set up your environment as follows:

https://github.com/Km3888/ZeroForge First create an anaconda environment called clip_forge using

conda env create -f environment.yaml
conda activate clip_forge

Then, install PyTorch 1.7.1 (or later) and torchvision. Please change the CUDA version based on your requirements.

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install git+https://github.com/openai/CLIP.git
pip install sklearn

You can download the CLIP-Forge initialization weights published by Sanghi et al. by running:

wget https://clip-forge-pretrained.s3.us-west-2.amazonaws.com/exps.zip
unzip exps.zip

This downloads a folder of their experimental results, the only part of which we're interested in is the models subfolder. Alternatively, you can get these initialization weights by trainig CLIP-Forge according to their instructions.

Neural Voxel Renderer

We use the Neural Voxel Renderer+ model described here. For compatibility with our other components, we wrote a PyTorch implementation using the exact same architecture and weights found in the official tensorflow implementation. To get the weights for the NVR+ model download them to a location NVR_WEIGHTS from our hugging space.

Running ZeroForge

The main file for training is zf_training.py which performs training on an array of text queries. Results are logged using tensorboard in the specified log directory.

python zf_training.py --query_array [QUERY] --log_dir [LOGDIR] --nvr_renderer_checkpoint [NVR_WEIGHTS]

The query_array argument specifies by name a uniform distribution over a set of text queries. For instance, the query array "three" learns a simple distribution over cutlery prompts. The query arrays we used for training ZeroForge are stored in query_arrays.json by name, but any set of text queries can be specified by adding it to the .json file.