The Ingredients for Robotic Diffusion Transformers

Sudeep Dasari, Oier Mees, Sebastian Zhao, Mohan Kumar Srirama, Sergey Levine

This repository offers an implementation of our improved Diffusion Transformer Policy (DiT-Block Policy), which achieves state-of-the-art manipulation results on long horizon bi-manual ALOHA robots and single-arm DROID Franka robots. This repo also allows easy use of our advanced pre-trained representations from prior work. We've succesfully deployed policies from this code on Franka robots (w/ DROID and MaNiMo), ALOHA robots, and on LEAP hands. Check out our eval scripts for more information. These policies can also be tested in simulation (see Sim README).

Installation

Our repository is easy to install using miniconda or anaconda:

conda env create -f env.yml
conda activate data4robotics
pip install git+https://github.com/AGI-Labs/robobuf.git
pip install git+https://github.com/facebookresearch/r3m.git
pip install -e ./
pre-commit install  # required for pushing back to the source git

Training DiT Policies (and Baselines)

First, you're going to need to convert your training trajectories into our robobuf format (pseudo-code below). Check out some example ALOHA and DROID conversion code here.

def _resize_and_encode(rgb_img, size=(256,256)):
    bgr_image = cv2.resize(bgr_image, size, interpolation=cv2.INTER_AREA)
    _, encoded = cv2.imencode(".jpg", bgr_image)
    return encoded

def convert_trajectories(input_trajs, out_path):
    out_buffer = []
    for traj in tqdm(input_trajs):
        out_traj = []
        for in_obs, in_ac, in_reward in enumerate(data):
            out_obs = dict(state=np.array(in_obs['state']).astype(np.float32),
                           enc_cam_0=_resize_and_encode(in_obs['image']))
            out_action = np.array(in_ac).astype(np.float32)
            out_reward = float(in_reward)
            out_traj.append((out_obs, out_action, out_reward))
        out_buffer.append(out_traj)

    with open(os.path.join(out_path, 'buf.pkl'), 'wb') as f:
        pkl.dump(out_trajs, f)

Once the conversion is complete, you can train our models using the example commands below:

# Training DiT Policy (Diffusion Transformer w/ adaLN + ResNet Tokenizer)
python finetune.py exp_name=test agent=diffusion task=end_effector_r6 agent/features=resnet_gn agent.features.restore_path=/pat/to/resnet18/IN_1M_resnet18.pth  trainer=bc_cos_sched ac_chunk=100

## SOME EXAMPLE BASELINES

# Gaussian Mixture Model bc-policy with SOUP representations
python finetune.py exp_name=test agent.features.restore_path=/path/to/SOUP_1M_DH.pth buffer_path=/data/path/buffer.pkl

# Diffusion Policy (U-Net head) w/ HRP representations
python finetune.py exp_name=test agent=diffusion_unet task=end_effector_r6 agent/features=vit_base agent.features.restore_path=/path/to/IN_hrp.pth buffer_path=/data/path/buffer.pkl trainer=bc_cos_sched ac_chunk=16

This will result in a policy checkpoint saved in the bc_finetune/<exp_name> folder.

Downloading the Bi-Play Dataset

We also provide an open-sourced dataset, named BiPlay, with over 7000 diverse, text-annotated, bi-manual expert demonstrations collected on an ALOHA robot. You may download the dataset from the following HuggingFace dataset. It can be loaded out of the box with the dataloader from Octo.

Using Pre-Trained Features

You can easily download our pre-trained represenations using the provided script: ./download_features.sh. You may also download the features individually on our release website.

The features are very modular, and easy to use in your own code-base! Please refer to the example code if you're interested.

Policy Deployment (Sim and Real)

Detailed instructions and eval scripts for real world deployment are provided here. Similarly, you can reproduce our sim results, using the command/code provided here.

Citations

If you find this codebase or the diffusion transformer useful, please cite:

@article{dasari2024ditpi,
    title={The Ingredients for Robotic Diffusion Transformers},
    author = {Sudeep Dasari and Oier Mees and Sebastian Zhao and Mohan Kumar Srirama and Sergey Levine},
    journal = {arXiv preprint arXiv:2410.10088},
    year={2024},
}

And if you use the representations, please cite:

@inproceedings{dasari2023datasets,
      title={An Unbiased Look at Datasets for Visuo-Motor Pre-Training},
      author={Dasari, Sudeep and Srirama, Mohan Kumar and Jain, Unnat and Gupta, Abhinav},
      booktitle={Conference on Robot Learning},
      year={2023},
      organization={PMLR}
}

@inproceedings{kumar2024hrp,
    title={HRP: Human Affordances for Robotic Pre-Training},
    author = {Mohan Kumar Srirama and Sudeep Dasari and Shikhar Bahl and Abhinav Gupta},
    booktitle = {Proceedings of Robotics: Science and Systems},
    address  = {Delft, Netherlands},
    year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data4robotics		data4robotics
eval_scripts		eval_scripts
experiments		experiments
media		media
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
README.md		README.md
diffuse_jobs.sh		diffuse_jobs.sh
download_features.sh		download_features.sh
env.yml		env.yml
finetune.py		finetune.py
jobs.sh		jobs.sh
pretrained_networks_example.py		pretrained_networks_example.py
setup.py		setup.py
test.py		test.py
test_agent.py		test_agent.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Ingredients for Robotic Diffusion Transformers

Installation

Training DiT Policies (and Baselines)

Downloading the Bi-Play Dataset

Using Pre-Trained Features

Policy Deployment (Sim and Real)

Citations

About

Releases

Packages

Contributors 2

Languages

License

SudeepDasari/dit-policy

Folders and files

Latest commit

History

Repository files navigation

The Ingredients for Robotic Diffusion Transformers

Installation

Training DiT Policies (and Baselines)

Downloading the Bi-Play Dataset

Using Pre-Trained Features

Policy Deployment (Sim and Real)

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages