Skip to content

Commit

Permalink
place holder Cogito, ergo sum
Browse files Browse the repository at this point in the history
  • Loading branch information
ZXP-S-works committed Jun 30, 2024
0 parents commit d14213a
Show file tree
Hide file tree
Showing 89 changed files with 15,924 additions and 0 deletions.
34 changes: 34 additions & 0 deletions 3dda_env.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: equ_act
channels:
- pytorch
- nvidia
- conda-forge
- defaults
dependencies:
- pip
- pip:
- healpy
- git+https://github.com/openai/CLIP.git
- pillow
- typed-argument-parser
- tqdm
- transformers
- absl-py
- matplotlib
- scipy
- tensorboard
- opencv-python
- blosc
- setuptools==57.5.0
- beautifulsoup4
- bleach>=6.0.0
- defusedxml
- jinja2>=3.0
- jupyter-core>=4.7
- jupyterlab-pygments
- mistune==2.0.5
- nbclient>=0.5.0
- nbformat>=5.7
- pandocfilters>=1.4.1
- tinycss2
- traitlets>=5.1
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Tsung-Wei Ke, Nikolaos Gkanatsios and Katerina Fragkiadaki

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
157 changes: 157 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
[//]: # ([![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-diffuser-actor-policy-diffusion-with-3d/zero-shot-generalization-on-calvin)](https://paperswithcode.com/sota/zero-shot-generalization-on-calvin?p=3d-diffuser-actor-policy-diffusion-with-3d))

[//]: # ([![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/3d-diffuser-actor-policy-diffusion-with-3d/robot-manipulation-on-rlbench)](https://paperswithcode.com/sota/robot-manipulation-on-rlbench?p=3d-diffuser-actor-policy-diffusion-with-3d))


# ? Placeholder for EquAct, 》= 3D Actor Diffuser,Cogito, ergo sum

[//]: # (By [Tsung-Wei Ke*](https://twke18.github.io/), [Nikolaos Gkanatsios*](https://nickgkan.github.io/) and [Katerina Fragkiadaki](https://www.cs.cmu.edu/~katef/))

[//]: # (Official implementation of ["3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"](https://arxiv.org/abs/2402.10885).)

[//]: # (This code base also includes our re-implementation of ["Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation"](https://arxiv.org/abs/2306.17817). We provide trained model weights for both methods.)

[//]: # (<!-- ![teaser]&#40;https://3d-diffuser-actor.github.io/static/videos/3d_scene.mp4&#41; -->)
[//]: # (![teaser]&#40;fig/teaser.gif&#41;)

We marry diffusion policies and 3D scene representations for robot manipulation. Diffusion policies learn the action distribution conditioned on the robot and environment state using conditional diffusion models. They have recently shown to outperform both deterministic and alternative state-conditioned action distribution learning methods. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy architecture that, given a language instruction, builds a 3D representation of the visual scene and conditions on it to iteratively denoise 3D rotations and translations for the robot’s end-effector. At each denoising iteration, our model represents end-effector pose estimates as 3D scene tokens and predicts the 3D translation and rotation error for each of them, by featurizing them using 3D relative attention to other 3D visual and language tokens. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 16.3% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it outperforms the current SOTA in the setting of zero-shot unseen scene generalization by being able to successfully run 0.2 more tasks, a 7% relative increase. It also works in the real world from a handful of demonstrations. We ablate our model’s architectural design choices, such as 3D scene featurization and 3D relative attentions, and show they all help generalization. Our results suggest that 3D scene representations and powerful generative modeling are keys to efficient robot learning from demonstrations.


# ? Model overview and stand-alone usage
To facilitate fast development on top of our model, we provide here an [overview of our implementation of 3D Diffuser Actor](./docs/OVERVIEW.md).

The model can be indenpendently installed and used as stand-alone package.
```
> pip install -e .
# import the model
> from diffuser_actor import DiffuserActor, Act3D
> model = DiffuserActor(...)
```

# ? Installation
Create a conda environment with the following command:
We recommend Mambaforge instead of the standard anaconda distribution for faster installation:
https://github.com/conda-forge/miniforge#mambaforge

```
# initiate conda env
> conda update conda
> mamba env create -f equiformerv2_env.yaml
> mamba env update -f 3dda_env.yaml
> conda activate equ_act
# install diffuser
#> pip install diffusers["torch"]
# install dgl (https://www.dgl.ai/pages/start.html)
> pip install dgl==1.1.3+cu116 -f https://data.dgl.ai/wheels/cu116/dgl-1.1.3%2Bcu116-cp38-cp38-manylinux1_x86_64.whl
# install flash attention (https://github.com/Dao-AILab/flash-attention#installation-and-features)
#> pip install packaging
#> pip install ninja
#???> pip install flash-attn --no-build-isolation
```

### ? Install CALVIN locally

Remember to use the latest `calvin_env` module, which fixes bugs of `turn_off_led`. See this [post](https://github.com/mees/calvin/issues/32#issuecomment-1363352121) for detail.
```
> git clone --recurse-submodules https://github.com/mees/calvin.git
> export CALVIN_ROOT=$(pwd)/calvin
> cd calvin
> cd calvin_env; git checkout main
> cd ..
> ./install.sh; cd ..
```

### ? Install RLBench locally
```
# Install open3D
> pip install open3d
> mkdir CoppeliaSim;
> cd CoppeliaSim/
> wget https://www.coppeliarobotics.com/files/V4_1_0/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz
> tar -xf CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz;
> echo "export COPPELIASIM_ROOT=$(pwd)/PyRep/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04" >> $HOME/.bashrc;
> echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$COPPELIASIM_ROOT" >> $HOME/.bashrc;
> echo "export QT_QPA_PLATFORM_PLUGIN_PATH=\$COPPELIASIM_ROOT" >> $HOME/.bashrc;
> source $HOME/.bashrc;
# Install PyRep (https://github.com/stepjam/PyRep?tab=readme-ov-file#install)
> git clone https://github.com/stepjam/PyRep.git;
> pip install -r requirements.txt; pip install -e .; cd ..
# Install RLBench (Note: there are different forks of RLBench)
# PerAct setup
> git clone https://github.com/MohitShridhar/RLBench.git
> cd RLBench; git checkout -b peract --track origin/peract; pip install -r requirements.txt; pip install -e .; cd ..;
```

Remember to modify the success condition of `close_jar` task in RLBench, as the original condition is incorrect. See this [pull request](https://github.com/MohitShridhar/RLBench/pull/1) for more detail.

# ? Data Preparation

See [Preparing RLBench dataset](./docs/DATA_PREPARATION_RLBENCH.md) and [Preparing CALVIN dataset](./docs/DATA_PREPARATION_CALVIN.md).


### ? (Optional) Encode language instructions

We provide our scripts for encoding language instructions with CLIP Text Encoder on CALVIN. Otherwise, you can find the encoded instructions on CALVIN and RLBench ([Link](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/instructions.zip)). Put the encoded instructions at root dir.
```
> python data_preprocessing/preprocess_calvin_instructions.py --output instructions/calvin_task_ABC_D/validation.pkl --model_max_length 16 --annotation_path ./calvin/dataset/task_ABC_D/validation/lang_annotations/auto_lang_ann.npy
> python data_preprocessing/preprocess_calvin_instructions.py --output instructions/calvin_task_ABC_D/training.pkl --model_max_length 16 --annotation_path ./calvin/dataset/task_ABC_D/training/lang_annotations/auto_lang_ann.npy
```

**Note:** We update our scripts for encoding language instructions on RLBench.
```
> python data_preprocessing/preprocess_rlbench_instructions.py --tasks place_cups close_jar insert_onto_square_peg light_bulb_in meat_off_grill open_drawer place_shape_in_shape_sorter place_wine_at_rack_location push_buttons put_groceries_in_cupboard put_item_in_drawer put_money_in_safe reach_and_drag slide_block_to_color_target stack_blocks stack_cups sweep_to_dustpan_of_size turn_tap --output instructions.pkl
```

# ? Model Zoo

We host the model weights on hugging face.

|| RLBench (PerAct) | RLBench (GNFactor) | CALVIN |
|--------|--------|--------|--------|
| 3D Diffuser Actor | [Weights](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_peract.pth) | [Weights](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_gnfactor.pth) | [Weights](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/diffuser_actor_calvin.pth) |
| Act3D | [Weights](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/act3d_peract.pth) | [Weights](https://huggingface.co/katefgroup/3d_diffuser_actor/blob/main/act3d_gnfactor.pth) | N/A |

<div class="column">
<img src="fig/sota_calvin.png" alt="input image" width="33%"/>
&nbsp;&nbsp;&nbsp;
<img src="fig/sota_rlbench.png" alt="input image" width="33%"/>
</div>

### ? Evaluate the pre-trained weights
First, donwload the weights and put under `train_logs/`

* For RLBench, run the bashscripts to test the policy. See [Getting started with RLBench](./docs/GETTING_STARTED_RLBENCH.md#step-3-test-the-policy) for detail.
* For CALVIN, you can run [this bashcript](./scripts/test_trajectory_calvin.sh).

**Important note:** Our released model weights of 3D Diffuser Actor assume input quaternions are in `wxyz` format. Yet, we didn't notice that CALVIN and RLBench simulation use different quaternion formats (`wxyz` and `xyzw`). We have updated our code base with an additional argument `quaternion_format` to switch between these two formats. We have verified the change by re-training and testing 3D Diffuser Actor on GNFactor with `xyzw` quaternions. The model achieves similar performance as the released checkpoint. Please see this [post](https://github.com/nickgkan/3d_diffuser_actor/issues/3#issue-2164855979) for more detail.

For users to train 3D Diffuser Actor from scratch, we update the training scripts with the correct `xyzw` quaternion format. For users to test our released model, we keep the `wxyz` quaternion format in the testing scripts ([Peract](./online_evaluation_rlbench/eval_peract.sh), [GNFactor](./online_evaluation_rlbench/eval_gnfactor.sh)).


# ? Getting started

See [Getting started with RLBench](./docs/GETTING_STARTED_RLBENCH.md) and [Getting started with CALVIN](./docs/GETTING_STARTED_CALVIN.md).


# ? Citation
If you find this code useful for your research, please consider citing our paper ["3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"](https://arxiv.org/abs/2402.10885).
```
@article{3d_diffuser_actor,
author = {Ke, Tsung-Wei and Gkanatsios, Nikolaos and Fragkiadaki, Katerina},
title = {3D Diffuser Actor: Policy Diffusion with 3D Scene Representations},
journal = {Arxiv},
year = {2024}
}
```

# ? License
This code base is released under the MIT License (refer to the LICENSE file for details).

# ? Acknowledgement
Parts of this codebase have been adapted from [Act3D](https://github.com/zhouxian/act3d-chained-diffuser) and [CALVIN](https://github.com/mees/calvin).
153 changes: 153 additions & 0 deletions data_preprocessing/episodes.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
{
"max_episode_length": {
"basketball_in_hoop": 4,
"beat_the_buzz": 4,
"block_pyramid": 35,
"change_channel": 8,
"close_drawer": 2,
"close_box": 5,
"close_jar": 7,
"close_grill": 4,
"change_clock": 4,
"close_microwave": 2,
"close_laptop_lid": 4,
"close_door": 3,
"close_fridge": 2,
"empty_dishwasher": 13,
"get_ice_from_fridge": 5,
"hang_frame_on_hanger": 4,
"hit_ball_with_queue": 7,
"hockey": 7,
"insert_onto_square_peg": 5,
"insert_usb_in_computer": 5,
"lamp_off": 2,
"lamp_on": 4,
"light_bulb_out": 5,
"light_bulb_in": 7,
"lift_numbered_block": 3,
"move_hanger": 5,
"meat_off_grill": 5,
"meat_on_grill": 5,
"open_door": 4,
"open_box": 3,
"open_drawer": 3,
"open_fridge": 3,
"open_jar": 7,
"open_grill": 3,
"open_microwave": 3,
"open_oven": 5,
"open_window": 4,
"open_wine_bottle": 3,
"reach_target": 1,
"reach_and_drag": 6,
"remove_cups": 6,
"phone_on_base": 5,
"pour_from_cup_to_cup": 6,
"pick_and_lift": 4,
"pick_and_lift_small": 4,
"pick_up_cup": 3,
"place_shape_in_shape_sorter": 7,
"place_hanger_on_rack": 6,
"place_cups": 23,
"play_jenga": 3,
"plug_charger_in_power_supply": 6,
"press_switch": 2,
"put_books_on_bookshelf": 5,
"put_bottle_in_fridge": 9,
"put_knife_on_chopping_board": 4,
"put_groceries_in_cupboard": 6,
"put_knife_in_knife_block": 5,
"put_item_in_drawer": 12,
"put_money_in_safe": 5,
"put_plate_in_colored_dish_rack": 5,
"put_rubbish_in_bin": 4,
"put_tray_in_oven": 12,
"put_toilet_roll_on_stand": 5,
"put_shoes_in_box": 13,
"put_umbrella_in_umbrella_stand": 4,
"stack_wine": 5,
"stack_blocks": 23,
"stack_chairs": 11,
"stack_cups": 10,
"straighten_rope": 7,
"scoop_with_spatula": 4,
"screw_nail": 8,
"setup_checkers": 6,
"setup_chess": 5,
"slide_block_to_target": 2,
"slide_block_to_color_target": 5,
"slide_cabinet_open_and_place_cups": 9,
"solve_puzzle": 7,
"sweep_to_dustpan": 5,
"sweep_to_dustpan_of_size": 5,
"push_button": 2,
"push_buttons": 6,
"push_repeated_buttons": 8,
"take_money_out_safe": 4,
"take_umbrella_out_of_umbrella_stand": 3,
"take_cup_out_from_cabinet": 7,
"take_frame_off_hanger": 4,
"take_item_out_of_drawer": 9,
"take_lid_off_saucepan": 3,
"take_off_weighing_scales": 7,
"take_plate_off_colored_dish_rack": 5,
"take_shoes_out_of_box": 15,
"take_toilet_roll_off_stand": 4,
"take_tray_out_of_oven": 10,
"take_usb_out_of_computer": 2,
"toilet_seat_up": 3,
"toilet_seat_down": 4,
"tower": 29,
"tower2": 17,
"tower3": 6,
"tower4": 11,
"tower_sim2real": 12,
"turn_oven_on": 3,
"turn_tap": 2,
"tv_on": 8,
"unplug_charger": 2,
"water_plants": 5,
"wipe_desk": 8,
"place_wine_at_rack_location": 5
},
"variable_length": [
"push_buttons",
"push_repeated_buttons",
"close_jar",
"open_jar",
"hockey",
"hit_ball_with_queue",
"lamp_on",
"put_tray_in_oven",
"solve_puzzle",
"sweep_to_dustpan",
"sweep_to_dustpan_of_size",
"take_off_weighing_scales",
"take_tray_out_of_oven",
"tower",
"tower2",
"tower3",
"tower4",
"stack_blocks",
"slide_block_to_color_target",
"place_cups",
"place_shape_in_shape_sorter",
"put_groceries_in_cupboard",
"slide_cabinet_open_and_place_cups",
"wipe_desk",
"setup_checkers",
"water_plants",
"screw_nail",
"plug_charger_in_power_supply",
"place_hanger_on_rack",
"open_oven",
"take_shoes_out_of_box"
],
"broken": [
"empty_container",
"put_all_groceries_in_cupboard",
"set_the_table",
"slide_cabinet_open",
"weighing_scales"
]
}
Loading

0 comments on commit d14213a

Please sign in to comment.