Data Preparation

Here details how to prepare all the datasets used in the training and testing stages of GLEE.

GLEE used the following 16 datasets for joint training, and perform zero-shot evaluation on additional 6 datasets. Among them, for Objects365, RefCOCO series, YouTubeVOS, Ref-YouTubeVOS, and BDD data, we followed UNINEXT for preprocessing. For the preprocessing of these datasets, you can refer to UNINEXT. For users who only want to test or continue fine-tune on part of the datasets, there is no need of downloading all datasets.

For Training

COCO

Please download COCO from the offical website. We use train2017.zip, train2014.zip, val2017.zip, test2017.zip & annotations_trainval2017.zip, image_info_test2017.zip. We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- coco
            -- annotations
            -- train2017
            -- train2014
            -- val2017
            -- test2017

LVIS

Please download LVISv1 from the offical website. LVIS uses the COCO 2017 train, validation, and test image sets, so only Annotation needs to be downloaded：lvis_v1_train.json.zip, lvis_v1_val.json.zip, lvis_v1_minival_inserted_image_name.json. We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- lvis
            -- lvis_v1_train.json
            -- lvis_v1_val.json
            -- lvis_v1_minival_inserted_image_name.json

VisualGenome

Please download VisualGenome images from the offical website: part 1 (9.2 GB), part 2 (5.47 GB), and download our preprocessed annotation file: train.json, train_from_objects.json . We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- visual_genome
            -- images
            	-- *.jpg
            			...
            -- annotations
              -- train_from_objects.json
              -- train.json

OpenImages

Please download OpenImages v6 images from the offical website, all detection annotations need to be preprocessed into coco format. We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- openimages
            -- detection
            -- openimages_v6_train_bbox.json

VIS

Download YouTube-VIS 2019, 2021, OVIS dataset for video instance segmentation task, and it is necessary to convert their video annotation into coco format in advance for image-level joint-training by run: python3 conversion/conver_vis2coco.py We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- ytvis_2019
            -- train
            -- val
            -- annotations
            		-- instances_train_sub.json
            		-- instances_val_sub.json
            		-- ytvis19_cocofmt.json
        -- ytvis_2021
            -- train
            -- val
            -- annotations
            		-- instances_train_sub.json
            		-- instances_val_sub.json
            		-- ytvis21_cocofmt.json
        -- ovis
            -- train
            -- val		
            -- annotations_train.json
            -- annotations_valid.json
            -- ovis_cocofmt.json

SA1B

We downloaded data from the SA1B official website, and only use [sa_000000.tar ~ sa_000050.tar] to preprocess into the required format and train the model. First, perform NMS operations on each sa_n directory to keep the larger object-level masks by running :

python3 convert_sam2coco_rewritresa1b.py  --src sa_000000
python3 convert_sam2coco_rewritresa1b.py  --src sa_000001
python3 convert_sam2coco_rewritresa1b.py  --src sa_000002
python3 convert_sam2coco_rewritresa1b.py  --src sa_000003
...
python3 convert_sam2coco_rewritresa1b.py  --src sa_000050

then merge all the annotations by running xxx.py.

python3 merge_sa1b.py

We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- SA1B
            -- images
            		-- sa_000000
            				-- sa_1.jpg
            				-- sa_1.json
            				-- ...
            		-- sa_000001
            		-- ...
            -- sa1b_subtrain_500k.json
            -- sa1b_subtrain_1m.json
            -- sa1b_subtrain_2m.json

UVO

Please download UVO from the offical website, and download our preprocessed annotation file annotations:

We expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- UVO
            -- uvo_videos_dense_frames_jpg
            -- uvo_videos_sparse_frames_jpg
            -- uvo_videos_frames
            -- annotations
            		-- FrameSet
            				-- UVO_frame_train_onecate.json
            				-- UVO_frame_val_onecate.json
            		-- VideoDenseSet
            				-- UVO_video_train_dense_objectlabel.json
            				-- UVO_video_val_dense_objectlabel.json

Objects365 and others

Following UNINEXT, we prepare Objects365, RefCOCO series, YouTubeVOS, Ref-YouTubeVOS, and BDD data, and we expect that they are organized as below:

${GLEE_ROOT}
    -- datasets
        -- Objects365v2
            -- annotations
                -- zhiyuan_objv2_train_new.json
                -- zhiyuan_objv2_val_new.json
            -- images
        -- annotations
            -- refcoco-unc
            -- refcocog-umd
            -- refcocoplus-unc
        -- ytbvos18
            -- train
            -- val
        -- ref-youtube-vos
            -- meta_expressions
            -- train
            -- valid
            -- train.json
            -- valid.json
            -- RVOS_refcocofmt.json
        -- bdd
            -- images
                -- 10k
                -- 100k
                -- seg_track_20
                -- track
            -- labels
                -- box_track_20
                -- det_20
                -- ins_seg
                -- seg_track_20

RVOS_refcocofmt.json is the conversion of the annotation of ref-youtube-vos into the format of RefCOCO, which is used for image-level training. It can be converted by run python3 conversion/ref-ytbvos-conversion.py

For Evaluation Only

The following datasets are only used for zero-shot evaluation, and are not used in joint-training.

OmniLabel

Please download OmniLabel from the offical website, and download our converted annotation in coco formation: omnilabel. we expect that the data is organized as below.

${GLEE_ROOT}
    -- datasets
        -- omnilabel
            -- images
            		-- coco
            		-- object365
            		-- openimagesv5
            -- omnilabel_coco.json
            -- omnilabel_obj365.json
            -- omnilabel_openimages.json
            -- omnilabel_cocofmt.json

ODinW

We follow GLIP to prepare the ODinW 35 dataset, and run python3 download.py to download it and organized as below.

${GLEE_ROOT}
    -- datasets
        -- odinw 
            -- dataset
            		-- coAerialMaritimeDroneco
            		-- CottontailRabbits
            		-- NorthAmericaMushrooms
            		-- ...

TAO&BURST

TAO and BURST share the same video frames.

First, download the validation set zip files (2-TAO_VAL.zip, 2_AVA_HACS_VAL_e49d8f78098a8ffb3769617570a20903.zip) and unzip them from https://motchallenge.net/tao_download.php.

Then, download our preprocessed YTVIS format (COCO-like) annotation files from huggingface:

https://huggingface.co/spaces/Junfeng5/GLEE_demo/tree/main/annotations/TAO

And organize them as below:

${GLEE_ROOT}
    -- datasets
        -- TAO 
            --burst_annotations
            		-- TAO_val_withlabel_ytvisformat.json
            		-- val
            				-- all_classes.json
            				-- ...
            --TAO_annotations
            	 	-- validation_ytvisfmt.json
            	 	-- validation.json
            -- frames
            		-- val
            			-- ArgoVerse
            			-- ava
            			-- ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATA.md

DATA.md

Data Preparation

For Training

COCO

LVIS

VisualGenome

OpenImages

VIS

SA1B

UVO

Objects365 and others

For Evaluation Only

OmniLabel

ODinW

TAO&BURST

Updating...

LV-VIS

MOSE

Files

DATA.md

Latest commit

History

DATA.md

File metadata and controls

Data Preparation

For Training

COCO

LVIS

VisualGenome

OpenImages

VIS

SA1B

UVO

Objects365 and others

For Evaluation Only

OmniLabel

ODinW

TAO&BURST

Updating...

LV-VIS

MOSE