Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update current tasks.md #272

Merged
merged 4 commits into from
Sep 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 94 additions & 81 deletions docs/current_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,55 +5,55 @@

## 1. Image tasks:

- AI2D (ai2d)
- ChartQA (chartqa)
- COCO Caption (coco_cap)
- [AI2D](https://arxiv.org/abs/1603.07396) (ai2d)
- [ChartQA](https://github.com/vis-nlp/ChartQA) (chartqa)
- [COCO Caption](https://github.com/tylin/coco-caption) (coco_cap)
- COCO 2014 Caption (coco2014_cap)
- COCO 2014 Caption Validation (coco2014_cap_val)
- COCO 2014 Caption Test (coco2014_cap_test)
- COCO 2017 Caption (coco2017_cap)
- COCO 2017 Caption MiniVal (coco2017_cap_val)
- COCO 2017 Caption MiniTest (coco2017_cap_test)
- [ConBench](https://github.com/foundation-multimodal-models/ConBench) (conbench)
- DetailCaps-4870 (detailcaps)
- DOCVQA (docvqa)
- [DetailCaps-4870](https://github.com/foundation-multimodal-models/CAPTURE) (detailcaps)
- [DOCVQA](https://github.com/anisha2102/docvqa) (docvqa)
- DOCVQA Validation (docvqa_val)
- DOCVQA Test (docvqa_test)
- Ferret (ferret)
- Flickr30K (flickr30k)
- [Ferret](https://github.com/apple/ml-ferret) (ferret)
- [Flickr30K](https://github.com/BryanPlummer/flickr30k_entities) (flickr30k)
- Flickr30K Test (flickr30k_test)
- GQA (gqa)
- GQA-ru (gqa_ru)
- II-Bench (ii_bench)
- Infographic VQA (infovqa)
- [GQA](https://cs.stanford.edu/people/dorarad/gqa/index.html) (gqa)
- [GQA-ru](https://huggingface.co/datasets/deepvk/GQA-ru) (gqa_ru)
- [II-Bench](https://github.com/II-Bench/II-Bench) (ii_bench)
- [Infographic VQA](https://www.docvqa.org/datasets/infographicvqa) (infovqa)
- Infographic VQA Validation (infovqa_val)
- Infographic VQA Test (infovqa_test)
- LiveBench (live_bench)
- [LiveBench](https://huggingface.co/datasets/lmms-lab/LiveBench) (live_bench)
- LiveBench 06/2024 (live_bench_2406)
- LiveBench 07/2024 (live_bench_2407)
- LLaVA-Bench-Wilder (llava_wilder_small)
- LLaVA-Bench-COCO (llava_bench_coco)
- LLaVA-Bench (llava_in_the_wild)
- MathVerse (mathverse)
- [LLaVA-Bench-Wilder](https://huggingface.co/datasets/lmms-lab/LLaVA-Bench-Wilder) (llava_wilder_small)
- [LLaVA-Bench-COCO](https://llava-vl.github.io/) (llava_bench_coco)
- [LLaVA-Bench](https://llava-vl.github.io/) (llava_in_the_wild)
- [MathVerse](https://github.com/ZrrSkywalker/MathVerse) (mathverse)
- MathVerse Text Dominant (mathverse_testmini_text_dominant)
- MathVerse Text Only (mathverse_testmini_text_only)
- MathVerse Text Lite (mathverse_testmini_text_lite)
- MathVerse Vision Dominant (mathverse_testmini_vision_dominant)
- MathVerse Vision Intensive (mathverse_testmini_vision_intensive)
- MathVerse Vision Only (mathverse_testmini_vision_only)
- MathVista (mathvista)
- [MathVista](https://mathvista.github.io/) (mathvista)
- MathVista Validation (mathvista_testmini)
- MathVista Test (mathvista_test)
- MMBench (mmbench)
- [MMBench](https://github.com/open-compass/MMBench) (mmbench)
- MMBench English (mmbench_en)
- MMBench English Dev (mmbench_en_dev)
- MMBench English Test (mmbench_en_test)
- MMBench Chinese (mmbench_cn)
- MMBench Chinese Dev (mmbench_cn_dev)
- MMBench Chinese Test (mmbench_cn_test)
- MME (mme)
- MMStar (mmstar)
- MMUPD (mmupd)
- [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) (mme)
- [MMStar](https://github.com/MMStar-Benchmark/MMStar) (mmstar)
- [MMUPD](https://huggingface.co/datasets/MM-UPD/MM-UPD) (mmupd)
- MMUPD Base (mmupd_base)
- MMAAD Base (mmaad_base)
- MMIASD Base (mmiasd_base)
Expand All @@ -66,8 +66,8 @@
- MMAAD Instruction (mmaad_instruction)
- MMIASD Instruction (mmiasd_instruction)
- MMIVQD Instruction (mmivqd_instruction)
- MMVet (mmvet)
- Multilingual LlaVa Bench
- [MMVet](https://github.com/yuweihao/MM-Vet) (mmvet)
- [Multilingual LlaVa Bench](https://huggingface.co/datasets/gagan3012/multilingual-llava-bench)
- llava_in_the_wild_arabic
- llava_in_the_wild_bengali
- llava_in_the_wild_chinese
Expand All @@ -77,15 +77,14 @@
- llava_in_the_wild_russian
- llava_in_the_wild_spanish
- llava_in_the_wild_urdu
- NoCaps (nocaps)
- [NoCaps](https://nocaps.org/) (nocaps)
- NoCaps Validation (nocaps_val)
- NoCaps Test (nocaps_test)
- OCRBench (ocrbench)
- OKVQA (ok_vqa)
- [OCRBench](https://github.com/Yuliang-Liu/MultimodalOCR) (ocrbench)
- [OKVQA](https://okvqa.allenai.org/) (ok_vqa)
- OKVQA Validation 2014 (ok_vqa_val2014)
- POPE (pope)
- RealWorldQA (realworldqa)
- RefCOCO (refcoco)
- [POPE](https://github.com/RUCAIBox/POPE) (pope)
- [RefCOCO](https://github.com/lichengunc/refer) (refcoco)
- refcoco_seg_test
- refcoco_seg_val
- refcoco_seg_testA
Expand All @@ -94,7 +93,7 @@
- refcoco_bbox_val
- refcoco_bbox_testA
- refcoco_bbox_testB
- RefCOCO+ (refcoco+)
- [RefCOCO+](https://github.com/lichengunc/refer) (refcoco+)
- refcoco+\_seg
- refcoco+\_seg_val
- refcoco+\_seg_testA
Expand All @@ -103,28 +102,28 @@
- refcoco+\_bbox_val
- refcoco+\_bbox_testA
- refcoco+\_bbox_testB
- RefCOCOg (refcocog)
- [RefCOCOg](https://github.com/lichengunc/refer) (refcocog)
- refcocog_seg_test
- refcocog_seg_val
- refcocog_bbox_test
- refcocog_bbox_val
- ScienceQA (scienceqa_full)
- [ScienceQA](https://scienceqa.github.io/) (scienceqa_full)
- ScienceQA Full (scienceqa)
- ScienceQA IMG (scienceqa_img)
- ScreenSpot (screenspot)
- [ScreenSpot](https://github.com/njucckevin/SeeClick) (screenspot)
- ScreenSpot REC / Grounding (screenspot_rec)
- ScreenSpot REG / Instruction Generation (screenspot_reg)
- ST-VQA (stvqa)
- synthdog (synthdog)
- [ST-VQA](https://rrc.cvc.uab.es/?ch=11) (stvqa)
- [synthdog](https://github.com/clovaai/donut) (synthdog)
- synthdog English (synthdog_en)
- synthdog Chinese (synthdog_zh)
- TextCaps (textcaps)
- [TextCaps](https://textvqa.org/textcaps/) (textcaps)
- TextCaps Validation (textcaps_val)
- TextCaps Test (textcaps_test)
- TextVQA (textvqa)
- [TextVQA](https://textvqa.org/) (textvqa)
- TextVQA Validation (textvqa_val)
- TextVQA Test (textvqa_test)
- VCR-Wiki
- [VCR-Wiki](https://github.com/tianyu-z/VCR)
- VCR-Wiki English
- VCR-Wiki English easy 100 (vcr_wiki_en_easy_100)
- VCR-Wiki English easy 500 (vcr_wiki_en_easy_500)
Expand All @@ -139,64 +138,67 @@
- VCR-Wiki Chinese hard 100 (vcr_wiki_zh_hard_100)
- VCR-Wiki Chinese hard 500 (vcr_wiki_zh_hard_500)
- VCR-Wiki Chinese hard (vcr_wiki_zh_hard)
- VibeEval (vibe_eval)
- VizWizVQA (vizwiz_vqa)
- [VibeEval](https://github.com/reka-ai/reka-vibe-eval) (vibe_eval)
- [VizWizVQA](https://vizwiz.org/tasks-and-datasets/vqa/) (vizwiz_vqa)
- VizWizVQA Validation (vizwiz_vqa_val)
- VizWizVQA Test (vizwiz_vqa_test)
- VQAv2 (vqav2)
- [VQAv2](https://visualqa.org/) (vqav2)
- VQAv2 Validation (vqav2_val)
- VQAv2 Test (vqav2_test)
- WebSRC (websrc)
- [WebSRC](https://x-lance.github.io/WebSRC/) (websrc)
- WebSRC Validation (websrc_val)
- WebSRC Test (websrc_test)
- WildVision-Bench (wildvision)
- [WildVision-Bench](https://github.com/WildVision-AI/WildVision-Bench) (wildvision)
- WildVision 0617(wildvision_0617)
- WildVision 0630 (wildvision_0630)
- SeedBench 2 Plus (seedbench_2_plus)
- [SeedBench 2 Plus](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-2-plus) (seedbench_2_plus)

## 2. Multi-image tasks:

- CMMMU (cmmmu)
- [CMMMU](https://cmmmu-benchmark.github.io/) (cmmmu)
- CMMMU Validation (cmmmu_val)
- CMMMU Test (cmmmu_test)
- HallusionBenchmark (hallusion_bench_image)
- ICON-QA (iconqa)
- [HallusionBench](https://github.com/tianyi-lab/HallusionBench) (hallusion_bench_image)
- [ICON-QA](https://iconqa.github.io/) (iconqa)
- ICON-QA Validation (iconqa_val)
- ICON-QA Test (iconqa_test)
- LLaVA-NeXT-Interleave-Bench (llava_interleave_bench)
- [LLaVA-NeXT-Interleave-Bench](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Interleave-Bench) (llava_interleave_bench)
- llava_interleave_bench_in_domain
- llava_interleave_bench_out_domain
- llava_interleave_bench_multi_view
- MIRB-hf (mirb)
- MMMU (mmmu)
- [MIRB](https://github.com/ys-zong/MIRB) (mirb)
- [MMMU](https://mmmu-benchmark.github.io/) (mmmu)
- MMMU Validation (mmmu_val)
- MMMU Test (mmmu_test)
- MMMU_Pro
- [MMMU_Pro](https://huggingface.co/datasets/MMMU/MMMU_Pro)
- MMMU Pro (mmmu_pro)
- MMMU Pro Original (mmmu_pro_original)
- MMMU Pro Vision (mmmu_pro_vision)
- MMMU Pro COT (mmmu_pro_cot)
- MMMU Pro Original COT (mmmu_pro_original_cot)
- MMMU Pro Vision COT (mmmu_pro_vision_cot)
- MMMU Pro Composite COT (mmmu_pro_composite_cot)
- MUIRBENCH (muirbench)
- Multi-DocVQA (multidocvqa)
- Multi-DocVQA Validation (multidocvqa_val)
- Multi-DocVQA Test (multidocvqa_test)
- OlympiadBench (olympiadbench)
- [MMT Multiple Image](https://mmt-bench.github.io/) (mmt_mi)
- MMT Multiple Image Validation (mmt_mi_val)
- MMT Multiple Image Test (mmt_mi_test)
- [MuirBench](https://muirbench.github.io/) (muirbench)
- [MP-DocVQA](https://github.com/rubenpt91/MP-DocVQA-Framework) (multidocvqa)
- MP-DocVQA Validation (multidocvqa_val)
- MP-DocVQA Test (multidocvqa_test)
- [OlympiadBench](https://github.com/OpenBMB/OlympiadBench) (olympiadbench)
- OlympiadBench Test English (olympiadbench_test_en)
- OlympiadBench Test Chinese (olympiadbench_test_cn)
- Q-Bench (qbenchs_dev)
- [Q-Bench](https://q-future.github.io/Q-Bench/) (qbenchs_dev)
- Q-Bench2-HF (qbench2_dev)
- Q-Bench-HF (qbench_dev)
- A-Bench-HF (abench_dev)

## 3. Videos tasks:

- ActivityNet-QA (activitynetqa_generation)
- SeedBench (seedbench)
- SeedBench 2 (seedbench_2)
- CVRR-ES (cvrr)
- [ActivityNet-QA](https://github.com/MILVLG/activitynet-qa) (activitynetqa_generation)
- [SeedBench](https://github.com/AILab-CVC/SEED-Bench) (seedbench)
- [SeedBench 2](https://github.com/AILab-CVC/SEED-Bench) (seedbench_2)
- [CVRR-ES](https://github.com/mbzuai-oryx/CVRR-Evaluation-Suite) (cvrr)
- cvrr_continuity_and_object_instance_count
- cvrr_fine_grained_action_understanding
- cvrr_interpretation_of_social_context
Expand All @@ -208,13 +210,16 @@
- cvrr_time_order_understanding
- cvrr_understanding_emotional_context
- cvrr_unusual_and_physically_anomalous_activities
- EgoSchema (egoschema)
- [EgoSchema](https://github.com/egoschema/EgoSchema) (egoschema)
- egoschema_mcppl
- egoschema_subset_mcppl
- egoschema_subset
- LongVideoBench
- MLVU (mlvu)
- MVBench (mvbench)
- [LongVideoBench](https://github.com/longvideobench/LongVideoBench)
- [MLVU](https://github.com/JUNJIE99/MLVU) (mlvu)
- [MMT-Bench](https://mmt-bench.github.io/) (mmt)
- MMT Validation (mmt_val)
- MMT Test (mmt_test)
- [MVBench](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/MVBENCH.md) (mvbench)

- mvbench_action_sequence
- mvbench_moving_count
Expand All @@ -237,13 +242,13 @@
- mvbench_moving_attribute
- mvbench_egocentric_navigation

- NExTQA (nextqa)
- [NExT-QA](https://github.com/doc-doc/NExT-QA) (nextqa)

- NExTQA Multiple Choice Test (nextqa_mc_test)
- NExTQA Open Ended Validation (nextqa_oe_val)
- NExTQA Open Ended Test (nextqa_oe_test)
- NExT-QA Multiple Choice Test (nextqa_mc_test)
- NExT-QA Open Ended Validation (nextqa_oe_val)
- NExT-QA Open Ended Test (nextqa_oe_test)

- PerceptionTest
- [PerceptionTest](https://github.com/google-deepmind/perception_test)

- PerceptionTest Test
- perceptiontest_test_mc
Expand All @@ -252,25 +257,25 @@
- perceptiontest_val_mc
- perceptiontest_val_mcppl

- TempCompass (tempcompass)
- [TempCompass](https://github.com/llyx97/TempCompass) (tempcompass)

- tempcompass_multi_choice
- tempcompass_yes_no
- tempcompass_caption_matching
- tempcompass_captioning

- Vatex (vatex)
- [Vatex](https://eric-xw.github.io/vatex-website/index.html) (vatex)

- Vatex Chinese (vatex_val_zh)
- Vatex Test (vatex_test)

- VideoDetailDescription (video_dc499)
- VideoChatGPT (videochatgpt)
- VideoChatGPT Generic (videochatgpt_gen)
- VideoChatGPT Temporal (videochatgpt_temporal)
- VideoChatGPT Consistency (videochatgpt_consistency)
- Video-MME (videomme)
- VITATECS (vitatecs)
- [VideoDetailDescription](https://huggingface.co/datasets/lmms-lab/VideoDetailCaption) (video_dc499)
- [Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT) (videochatgpt)
- Video-ChatGPT Generic (videochatgpt_gen)
- Video-ChatGPT Temporal (videochatgpt_temporal)
- Video-ChatGPT Consistency (videochatgpt_consistency)
- [Video-MME](https://video-mme.github.io/) (videomme)
- [VITATECS](https://github.com/lscpku/VITATECS) (vitatecs)

- VITATECS Direction (vitatecs_direction)
- VITATECS Intensity (vitatecs_intensity)
Expand All @@ -279,9 +284,17 @@
- VITATECS Localization (vitatecs_localization)
- VITATECS Type (vitatecs_type)

- WorldQA (worldqa)
- [WorldQA](https://zhangyuanhan-ai.github.io/WorldQA/) (worldqa)

- WorldQA Generation (worldqa_gen)
- WorldQA Multiple Choice (worldqa_mc)

- YouCook2 (youcook2_val)
- [YouCook2](http://youcook2.eecs.umich.edu/) (youcook2_val)

## 4. Text Tasks

- [GSM8K](https://github.com/openai/grade-school-math) (gsm8k)
- [HellaSwag](https://rowanzellers.com/hellaswag/) (hellaswag)
- [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval) (ifeval)
- [MMLU](https://github.com/hendrycks/test) (mmlu)
- [MMLU_pro](https://github.com/TIGER-AI-Lab/MMLU-Pro) (mmlu_pro)
Loading