Skip to content

Latest commit

 

History

History
157 lines (108 loc) · 9.82 KB

README.md

File metadata and controls

157 lines (108 loc) · 9.82 KB

EMMA: Policy

Python 3.9 PyTorch Lightning Poetry Config: Hydra
pre-commit style: black wemake-python-stylegude

Continuous Integration Tests Build and push images


Quick start

Assuming you have pyenv and Poetry, clone the repository and run:

# Use Python 3.9.13 in the project
pyenv local 3.9.13

# Tell Poetry to use pyenv
poetry env use $(pyenv which python)

# Install dependencies
poetry install

# Activate the virtual environment
poetry shell

# Install pre-commit hooks
pre-commit install

Check out the CONTRIBUTING.md for more detailed information on getting started.

Installing optional dependencies

We've separated specific groups of dependencies so that you only need to install what you need.

  • For demonstrating using Gradio, run poetry install --with demo

Project structure

This is organised in very similarly to structure from the Lightning-Hydra-Template to facilitate reproducible research code.

  • scriptssh scripts to run experiments
  • configs — configurations files using the Hydra framework
  • docker — Dockerfiles to ease deployment
  • notebooks — Jupyter notebook for analysis and exploration
  • storage — data for training/inference (and maybe use symlinks to point to other parts of the filesystem)
  • testspytest scripts to verify the code
  • src — where the main code lives

Downloading data

Checkpoints

All checkpoints are available here on HugginFace

These checkpoints include:

Model name Description
emma_base_pretrain.ckpt The EMMA base pretrained checkpoint
unified_emma_base_finetune_arena.ckpt The EMMA-unified variant fine tuned on the DTC task
modular_action_emma_base_finetune_arena.ckpt The EMMA-modular variant fine tuned on the DTC task that performs action execution and visual grounding
vinvl_finetune_arena.ckpt The finetuned VinVL checkpoint

DBs

The DBs are required for pre-training and fine tuning and are available on Hugginface

We are providing DBs:

  1. Pretraining on image-based tasks (one-db per task)
  2. Finetuning on image-based tasks (one-db per task)
  3. Finetuning on the DTC tasks (one-db for action execution / visual grounding & one db for the contextual routing task)

Make sure that these are placed under storage/db folder or alternatively set the path to the dbs within each experiment config.

Features

The image features for all image-base tasks and the DTC benchmark on Huggingface

The image features were extracted using the pretrained VinVL checkpoint. For the DTC benchmark we have finetuned the checkpoint on the Alexa Arena data.

Pretraining

First, make sure that you have downloaded the pretraining db and the corresponding features.

python run.py experiment=pretrain.yaml

Downstream

COCO

python run.py experiment=coco_downstream.yaml

VQAv2

python run.py experiment=vqa_v2_downstream.yaml

RefCOCOg

python run.py experiment=refcoco_downstream.yaml

NLVR^2

python run.py experiment=nlvr2_downstream.yaml

DTC - Unified model

When initializing from the pretrained model, which doesn't include the special tokens for the downstream CR and action prediction tasks, you will need to manually edit the vocabulary size in the model config. For initialization from the pretrained emma-base, set the vocab_size to 10252.

python run.py experiment=simbot_combined.yaml