Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
train_coco_panoptic_pretrained.py		train_coco_panoptic_pretrained.py
train_imagenet2012_colorization_pretrained.py		train_imagenet2012_colorization_pretrained.py
train_nyu_depth_pretrained.py		train_nyu_depth_pretrained.py
uvim_color_task.ipynb		uvim_color_task.ipynb
uvim_depth_task.ipynb		uvim_depth_task.ipynb
uvim_panoptic_task.ipynb		uvim_panoptic_task.ipynb
vqvae_coco_panoptic.py		vqvae_coco_panoptic.py
vqvae_imagenet2012_colorization.py		vqvae_imagenet2012_colorization.py
vqvae_nyu_depth.py		vqvae_nyu_depth.py

README.md

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

by Alexander Kolesnikov, André Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby

We provide pretrained UViM models from the original paper, as well as the instructions on how to reproduce core paper experiments.

Pretrained models

The table below contains UViM models (stage I and II) trained for three different tasks: panoptic segmentation, colorization and depth prediction.

task	model	dataset	accuracy	download link
Panoptic segmentation	UViM Stage I model	COCO(2017)	75.8 PQ	link
Panoptic segmentation	UViM Stage II model	COCO(2017)	43.1 PQ	link
Colorization	UViM Stage I model	ILSVRC-2012	15.59 FID	link
Colorization	UViM Stage II model	ILSVRC-2012	16.99 FID	link
Depth	UViM Stage I model	NYU Depth V2	0.155 RMSE	link
Depth	UViM Stage II model	NYU Depth V2	0.463 RMSE	link

All of this models can be interactively explored in our colabs.

Running on a single-host TPU machine

Below we provide instructions on how to run UViM training (stage I and stage II) using a single TPU host with 8 TPU accelerators. These instructions can be easily adapted to a GPU host and multi-host TPU setup, see the main big_vision README file.

We assume that the user has already created and ssh-ed to the TPU host machine. The next step is to clone big_vision repository: git clone https://github.com/google-research/big_vision.git.

The next steps are to create a python virtual environment and install python dependencies:

virtualenv bv
source bv/bin/activate
cd big_vision/
pip3 install --upgrade pip
pip3 install -r big_vision/requirements.txt
pip install "jax[tpu]>=0.2.16" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html

After this invoke the helper tool to download and prepare data: python3 -m big_vision.tools.download_tfds_datasets coco/2017_panoptic nyu_depth_v2. For preparing the ImageNet dataset consult the main codebase README.

⚠️ TPU machines have 100 GB of the disk space. It may not be enough to store all training data (though only panoptic or only depth data may fit). Consider preparing the data on a seperate machine and then copying it to to TPU machine's extra persistent disk or to a Google Cloud Bucket. See instructions for creating an extra persistent disk. Remember to set the correct data home directory, e.g.export DISK=/mnt/disk/persist; export TFDS_DATA_DIR=$DISK/tensorflow_datasets.

Our panoptic evaluator uses raw variant of the COCO data, so we move it into a separate folder. Note, tfds has already pre-downloaded the panoptic data, except for one small json file that we fetch manually:

mkdir $DISK/coco_data
cd $DISK/coco_data
mv $TFDS_DATA_DIR/downloads/extracted/ZIP.image.cocod.org_annot_panop_annot_train<REPLACE_ME_WITH_THE_HASH_CODE>.zip/annotations/* .
wget https://raw.githubusercontent.com/cocodataset/panopticapi/master/panoptic_coco_categories.json
export COCO_DATA_DIR=$DISK/coco_data

For FID evaluator, which is used for the colorization model, set the path to the directory with image id files, e.g. export FID_DATA_DIR=<ROOT>/big_vision/evaluators/proj/uvim/coltran_fid_data.

As an example, stage I panoptic training can be invoked as (note the :singlehost config parameter which will use lightweight configuration suitable for a single host):

python3 -m big_vision.trainers.proj.uvim.vqvae --config big_vision/configs/proj/uvim/vqvae_coco_panoptic.py:singlehost --workdir workdirs/`date '+%m-%d_%H%M'`

or stage II training

python3 -m big_vision.trainers.proj.uvim.train --config big_vision/configs/proj/uvim/train_coco_panoptic_pretrained.py:singlehost --workdir workdirs/`date '+%m-%d_%H%M'`

Acknowledgments

The sampling code in models/proj/uvim/decode.py module is based on contributions from Anselm Levskaya, Ilya Tolstikhin and Maxim Neumann.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uvim

uvim

README.md

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

Pretrained models

Running on a single-host TPU machine

Acknowledgments

Files

uvim

Directory actions

More options

Directory actions

More options

Latest commit

History

uvim

Folders and files

parent directory

README.md

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

Pretrained models

Running on a single-host TPU machine

Acknowledgments