An official data + code for "Visually Grounding Language Instruction for History-Dependent Manipulation" (ICRA 2022)
- After cloning this repository, you would see the data folder, which has everything for training and testing.
- It consists of 300 history-dependent-manipulation tasks.
- Each "scene_xxxx" folder has: bbox, heatmap, image, meta folders
- "image" has images observing the workspace from bird-eye-view and from front-side.
- "bbox" has information of cube objects existing in image.
- "meta" has (1) language annotation and (2) bounding-box information for pick-and-place (3) whether it has explicit/implicit dependency.
- "0_1.json" has manipulation information between image 0 and image 1.
- "heatmap" has ground truth heatmaps for pick and place
- "pick/place_0_1.json" show heatmap for manipulation between image 0 and image 1.
- Use
./train.ipynb
for training your own model. - There is a parameter named
temp
, which is a temperature parameter used as denominator before softmax function of attention module.- Models described in our paper has
temp=2
. - We find out that this temperature parameter makes the attention TOO FLAT.
- Therefore, we recommend others to try smaller value of this, such as
temp=0.1
. - Our pretrained result with
temp=2
: Google Drive- Saving these models as
best_models/*.pth
- Saving these models as
- Models described in our paper has
- More details about training will be updated based on the Issues.
- Use
./test_qualitative_saver.ipynb
to select how results (heatmaps) are generated based on the randomly sampled scenario - Use
./test_quantitative_saver.ipynb
to save generated results into .json files.- In this notebook, inputs are from
./test_tasks
folder, which contains sampled test history-dependent-manipulation scnarios. - Save outputs into
./performance
folder. In code, a variableresult_dir
is the one that defines where your test result will be saved - Saved outputs can be analyzed by using
./test_quantitative_plotter.ipynb
.
- In this notebook, inputs are from
- More details about validation will be updated based on the Issues.
- Blender (tested on v.2.78)
- Python 3.7
- Clone the repository
git clone https://github.com/cotton-ahn/history-dependent-manipulation
cd history-dependent-manipulation
- Clone the CLEVR dataset generation code for data generation
git clone https://github.com/facebookresearch/clevr-dataset-gen
- Follow instructions from CLEVR, and make sure your system is possible to generate images.
- run
./generation_setup.sh
to copy files for proper image generation - go to
clevr-dataset-gen/image_generation
- run as below to generate image
# with GPU, generate episode that always stack at least one block.
blender --background --python render_images_with_stack.py -- --use_gpu 1
# with GPU, generate episode without constraints about stack as above.
blender --background --python render_images_wo_stack.py -- --use_gpu 1
# with GPU, generate scene with only rubber blocks, and move block for 5 times.
blender --background --python render_images_wo_stack.py -- --use_gpu 1 --materials rubber --num_moves 5
- Image data will be saved to
clevr-dataset-gen/output
. - To find how to annotate bounding box from the generated files,
- refer to
{this repository}/find_bbox_info.pynb
- refer to