For pre-training Mamba-YOLO-World, we adopt several datasets as listed in the below table:
Data | Samples | Type | Boxes |
---|---|---|---|
Objects365v1 | 609k | detection | 9,621k |
GQA | 621k | grounding | 3,681k |
Flickr | 149k | grounding | 641k |
We put all data into the data
directory, such as:
├── coco
│ ├── annotations
│ │ ├── instances_val2017.json
│ │ └── instances_train2017.json
│ ├── lvis
│ │ └── lvis_v1_minival_inserted_image_name.json
│ ├── train2017
│ └── val2017
├── flickr
│ ├── final_flickr_separateGT_train.json
│ └── images
├── mixed_grounding
│ ├── final_mixed_train_no_coco.json
│ ├── images
├── objects365v1
│ ├── objects365_train.json
│ └── train
└── texts
NOTE: We strongly suggest that you check the directories or paths in the dataset part of the config file, especially for the values ann_file
, data_root
, and data_prefix
.
We provide the annotations of the pre-training data in the below table:
Data | images | Annotation File |
---|---|---|
Objects365v1 | Objects365 train |
objects365_train.json |
MixedGrounding | GQA |
final_mixed_train_no_coco.json |
Flickr30k | Flickr30k |
final_flickr_separateGT_train.json |
LVIS-minival | COCO val2017 |
lvis_v1_minival_inserted_image_name.json |
Acknowledgement: We sincerely thank GLIP and mdetr for providing the annotation files for pre-training.