Skip to content

Latest commit

 

History

History
177 lines (148 loc) · 6.85 KB

DATA.md

File metadata and controls

177 lines (148 loc) · 6.85 KB

Dataset Preparation

We utilize seven datsets: Google Conceptual Captions (GCC), Stony Brook University Captions (SBU), Visual Genome (VG), COCO Captions (COCO), Flickr 30K Captions (F30K), Visual Question Answering v2 (VQAv2), and Natural Language for Visual Reasoning 2 (NLVR2).

We do not distribute datasets because of the license issue. Please download the datasets by yourself. We use pyarrow to serialize the datasets, conversion scripts are located in vilt/utils/write_*.py. Please organize the datasets as follows and run make_arrow functions to convert the dataset to pyarrow binary file.

GCC

https://ai.google.com/research/ConceptualCaptions/download

GCC provides tuples of image url and caption, note that a quite portion of the urls are unaccessible now. Write your own download script and organize the dataset as following structure.

root
├── images_train            
│   ├── 0000                # First four letters of image name
│   │   ├── 0000000         # Image Binary
│   │   ├── 0000001      
│   │   └── ...
│   ├── 0001              
│   │   ├── 0001000      
│   │   ├── 0001001      
│   │   └── ...          
│   └── ...          
├── images_val          
│   ├── 0000              
│   │   └── ...
│   └── ...          
├── train_annot.json        # List of (image_file_path, caption) tuple
└── val_annot.json          # List of (image_file_path, caption) tuple
from vilt.utils.write_conceptual_caption import make_arrow
make_arrow(root, arrows_root)

SBU

http://www.cs.virginia.edu/~vicente/sbucaptions/

Similar to GCC, SBU also provides tuples of image url and caption, and also a quite portion of the urls are unaccessible now. Write your own download script and organize the dataset as following structure.

root
├── images_train            
│   ├── 0000                # First four letters of image name
│   │   ├── 0000000         # Image Binary
│   │   ├── 0000001      
│   │   └── ...
│   ├── 0001              
│   │   ├── 0001000      
│   │   ├── 0001001      
│   │   └── ...          
│   └── ...          
└── annot.json              # List of (image_file_path, caption) tuple
from vilt.utils.write_sbu import make_arrow
make_arrow(root, arrows_root)

VG

http://visualgenome.org/api/v0/api_home.html

Download image part1, image part2 and region descriptions

root
├── images            
│   ├── VG_100K                  
│   │   ├── 10.jpg        
│   │   ├── 107899.jpg      
│   │   └── ...
│   ├── VG_100K_2              
│   │   ├── 1.jpg      
│   │   ├── 100.jpg      
│   │   └── ...          
│   └── ...          
└── annotations         
    └── region_descriptions.json
from vilt.utils.write_vg import make_arrow
make_arrow(root, arrows_root)

COCO

https://cocodataset.org/#download

Download 2014 train images, 2014 val images and karpathy split

root
├── train2014            
│   ├── COCO_train2014_000000000009.jpg                
|   └── ...
├── val2014              
|   ├── COCO_val2014_000000000042.jpg
|   └── ...          
└── karpathy
    └── dataset_coco.json
from vilt.utils.write_coco_karpathy import make_arrow
make_arrow(root, arrows_root)

F30K

http://bryanplummer.com/Flickr30kEntities/

Sign flickr images request form and download karpathy split

root
├── flickr30k-images            
│   ├── 1000092795.jpg
|   └── ...
└── karpathy
    └── dataset_flickr30k.json
from vilt.utils.write_f30k_karpathy import make_arrow
make_arrow(root, arrows_root)

VQAv2

https://visualqa.org/download.html

Download COCO 2014 train images, 2014 val images, 2015 test images, annotations (train, val), and questions (train, val, test)

root
├── train2014            
│   ├── COCO_train2014_000000000009.jpg                
|   └── ...
├── val2014              
|   ├── COCO_val2014_000000000042.jpg
|   └── ...  
├── test2015              
|   ├── COCO_test2015_000000000001.jpg
|   └── ...         
├── v2_OpenEnded_mscoco_train2014_questions.json
├── v2_OpenEnded_mscoco_val2014_questions.json
├── v2_OpenEnded_mscoco_test2015_questions.json
├── v2_OpenEnded_mscoco_test-dev2015_questions.json
├── v2_mscoco_train2014_annotations.json
└── v2_mscoco_val2014_annotations.json
from vilt.utils.write_vqa import make_arrow
make_arrow(root, arrows_root)

NLVR2

http://bryanplummer.com/Flickr30kEntities/

Clone the repository and sign the request form to download the images.

root
├── images/train           
│   ├── 0                  
│   │   ├── train-10108-0-img0.png   
│   │   └── ...
│   ├── 1                  
│   │   ├── train-10056-0-img0.png       
│   │   └── ...
│   └── ...
├── dev       
│   ├── dev-0-0-img0.png
|   └── ...
├── test1     
│   ├── test1-0-0-img0.png
|   └── ...
├── nlvr
├── nlvr2
└── README.md
from vilt.utils.write_nlvr2 import make_arrow
make_arrow(root, arrows_root)