We present a Datasheet for documentation and responsible usage of our training dataset.
We create this dataset to learn general robot manipulation with multimodal prompts.
Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)?
This dataset was created by Yunfan Jiang (NVIDIA, Stanford), Agrim Gupta (Stanford), Zichen "Charles" Zhang (Macalester College), Guanzhi Wang (NVIDIA, Caltech), Yongqiang Dou (Tsinghua), Yanjun Chen (Stanford), Li Fei-Fei (Stanford), Anima Anandkumar (NVIDIA, Caltech), Yuke Zhu (NVIDIA, UT Austin), and Linxi "Jim" Fan (NVIDIA).
Will the dataset be distributed to third parties outside of the entity (e.g., company, institution, organization) on behalf of which the dataset was created?
Yes, the dataset is publicly available on the internet.
The dataset can be downloaded from Zenodo.
Have any third parties imposed IP-based or other restrictions on the data associated with the instances?
No.
Do any export controls or other regulatory restrictions apply to the dataset or to individual instances?
No.
The authors will be supporting, hosting, and maintaining the dataset.
Please contact Yunfan Jiang ([email protected]) and Linxi Fan ([email protected]).
No. We will make announcements if there is any.
Will the dataset be updated (e.g., to correct labeling errors, add new instances, delete instances)?
Yes. New updates will be posted on https://vimalabs.github.io/.
If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were the individuals in question told that their data would be retained for a fixed period of time and then deleted)?
N/A.
Yes, old versions will be permanently accessible on zenodo.org.
If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so?
Yes, please refer to https://vimalabs.github.io/.
Our data contain successful demonstrations to complete robotics tasks paired with multimodal prompts. Data modalities include RGB images, arrays (e.g., for actions), and structured data (e.g., for task meta info).
We provide 650K successful trajectories in total.
Does the dataset contain all possible instances or is it a sample (not necessarily random) of instances from a larger set?
We provide all instances in our Zenodo data repositories.
Yes, we provide optimal action labels to train behavior cloning models.
No.
Are relationships between individual instances made explicit (e.g., users’ movie ratings, social network links)?
N/A.
Yes, we use 600K for training and 50K for validation.
No.
Is the dataset self-contained, or does it link to or otherwise rely on external resources (e.g.,websites, tweets, other datasets)?
Yes, it is self-contained.
No.
Does the dataset contain data that, if viewed directly, might be offensive, insulting, threatening, or might otherwise cause anxiety?
No.
Who was involved in the data collection process (e.g., students, crowdworkers, contractors) and how were they compensated (e.g., how much were crowdworkers paid)?
All data collection, curation, and filtering are done by VIMA coauthors.
The data was collected primarily during summer 2022.
Yes, we have used it to train our VIMA models for general robot manipulation.
This dataset also serves the purpose to learn generalist agents.
Is there anything about the composition of the dataset or the way it was collected and preprocessed/cleaned/labeled that might impact future uses?
No.
No.