Skip to content

Commit

Permalink
Update current tasks.md (#272)
Browse files Browse the repository at this point in the history
* update and sort tasks

* Update current_tasks.md

* minor changes
  • Loading branch information
pbcong authored and KairuiHu committed Oct 24, 2024
1 parent a80414b commit c25d2d5
Showing 1 changed file with 94 additions and 81 deletions.
175 changes: 94 additions & 81 deletions docs/current_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,55 +5,55 @@
## 1. Image tasks:

- AI2D (ai2d)
- ChartQA (chartqa)
- COCO Caption (coco_cap)
- [AI2D](https://arxiv.org/abs/1603.07396) (ai2d)
- [ChartQA](https://github.com/vis-nlp/ChartQA) (chartqa)
- [COCO Caption](https://github.com/tylin/coco-caption) (coco_cap)
- COCO 2014 Caption (coco2014_cap)
- COCO 2014 Caption Validation (coco2014_cap_val)
- COCO 2014 Caption Test (coco2014_cap_test)
- COCO 2017 Caption (coco2017_cap)
- COCO 2017 Caption MiniVal (coco2017_cap_val)
- COCO 2017 Caption MiniTest (coco2017_cap_test)
- [ConBench](https://github.com/foundation-multimodal-models/ConBench) (conbench)
- DetailCaps-4870 (detailcaps)
- DOCVQA (docvqa)
- [DetailCaps-4870](https://github.com/foundation-multimodal-models/CAPTURE) (detailcaps)
- [DOCVQA](https://github.com/anisha2102/docvqa) (docvqa)
- DOCVQA Validation (docvqa_val)
- DOCVQA Test (docvqa_test)
- Ferret (ferret)
- Flickr30K (flickr30k)
- [Ferret](https://github.com/apple/ml-ferret) (ferret)
- [Flickr30K](https://github.com/BryanPlummer/flickr30k_entities) (flickr30k)
- Flickr30K Test (flickr30k_test)
- GQA (gqa)
- GQA-ru (gqa_ru)
- II-Bench (ii_bench)
- Infographic VQA (infovqa)
- [GQA](https://cs.stanford.edu/people/dorarad/gqa/index.html) (gqa)
- [GQA-ru](https://huggingface.co/datasets/deepvk/GQA-ru) (gqa_ru)
- [II-Bench](https://github.com/II-Bench/II-Bench) (ii_bench)
- [Infographic VQA](https://www.docvqa.org/datasets/infographicvqa) (infovqa)
- Infographic VQA Validation (infovqa_val)
- Infographic VQA Test (infovqa_test)
- LiveBench (live_bench)
- [LiveBench](https://huggingface.co/datasets/lmms-lab/LiveBench) (live_bench)
- LiveBench 06/2024 (live_bench_2406)
- LiveBench 07/2024 (live_bench_2407)
- LLaVA-Bench-Wilder (llava_wilder_small)
- LLaVA-Bench-COCO (llava_bench_coco)
- LLaVA-Bench (llava_in_the_wild)
- MathVerse (mathverse)
- [LLaVA-Bench-Wilder](https://huggingface.co/datasets/lmms-lab/LLaVA-Bench-Wilder) (llava_wilder_small)
- [LLaVA-Bench-COCO](https://llava-vl.github.io/) (llava_bench_coco)
- [LLaVA-Bench](https://llava-vl.github.io/) (llava_in_the_wild)
- [MathVerse](https://github.com/ZrrSkywalker/MathVerse) (mathverse)
- MathVerse Text Dominant (mathverse_testmini_text_dominant)
- MathVerse Text Only (mathverse_testmini_text_only)
- MathVerse Text Lite (mathverse_testmini_text_lite)
- MathVerse Vision Dominant (mathverse_testmini_vision_dominant)
- MathVerse Vision Intensive (mathverse_testmini_vision_intensive)
- MathVerse Vision Only (mathverse_testmini_vision_only)
- MathVista (mathvista)
- [MathVista](https://mathvista.github.io/) (mathvista)
- MathVista Validation (mathvista_testmini)
- MathVista Test (mathvista_test)
- MMBench (mmbench)
- [MMBench](https://github.com/open-compass/MMBench) (mmbench)
- MMBench English (mmbench_en)
- MMBench English Dev (mmbench_en_dev)
- MMBench English Test (mmbench_en_test)
- MMBench Chinese (mmbench_cn)
- MMBench Chinese Dev (mmbench_cn_dev)
- MMBench Chinese Test (mmbench_cn_test)
- MME (mme)
- MMStar (mmstar)
- MMUPD (mmupd)
- [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation) (mme)
- [MMStar](https://github.com/MMStar-Benchmark/MMStar) (mmstar)
- [MMUPD](https://huggingface.co/datasets/MM-UPD/MM-UPD) (mmupd)
- MMUPD Base (mmupd_base)
- MMAAD Base (mmaad_base)
- MMIASD Base (mmiasd_base)
Expand All @@ -66,8 +66,8 @@
- MMAAD Instruction (mmaad_instruction)
- MMIASD Instruction (mmiasd_instruction)
- MMIVQD Instruction (mmivqd_instruction)
- MMVet (mmvet)
- Multilingual LlaVa Bench
- [MMVet](https://github.com/yuweihao/MM-Vet) (mmvet)
- [Multilingual LlaVa Bench](https://huggingface.co/datasets/gagan3012/multilingual-llava-bench)
- llava_in_the_wild_arabic
- llava_in_the_wild_bengali
- llava_in_the_wild_chinese
Expand All @@ -77,15 +77,14 @@
- llava_in_the_wild_russian
- llava_in_the_wild_spanish
- llava_in_the_wild_urdu
- NoCaps (nocaps)
- [NoCaps](https://nocaps.org/) (nocaps)
- NoCaps Validation (nocaps_val)
- NoCaps Test (nocaps_test)
- OCRBench (ocrbench)
- OKVQA (ok_vqa)
- [OCRBench](https://github.com/Yuliang-Liu/MultimodalOCR) (ocrbench)
- [OKVQA](https://okvqa.allenai.org/) (ok_vqa)
- OKVQA Validation 2014 (ok_vqa_val2014)
- POPE (pope)
- RealWorldQA (realworldqa)
- RefCOCO (refcoco)
- [POPE](https://github.com/RUCAIBox/POPE) (pope)
- [RefCOCO](https://github.com/lichengunc/refer) (refcoco)
- refcoco_seg_test
- refcoco_seg_val
- refcoco_seg_testA
Expand All @@ -94,7 +93,7 @@
- refcoco_bbox_val
- refcoco_bbox_testA
- refcoco_bbox_testB
- RefCOCO+ (refcoco+)
- [RefCOCO+](https://github.com/lichengunc/refer) (refcoco+)
- refcoco+\_seg
- refcoco+\_seg_val
- refcoco+\_seg_testA
Expand All @@ -103,28 +102,28 @@
- refcoco+\_bbox_val
- refcoco+\_bbox_testA
- refcoco+\_bbox_testB
- RefCOCOg (refcocog)
- [RefCOCOg](https://github.com/lichengunc/refer) (refcocog)
- refcocog_seg_test
- refcocog_seg_val
- refcocog_bbox_test
- refcocog_bbox_val
- ScienceQA (scienceqa_full)
- [ScienceQA](https://scienceqa.github.io/) (scienceqa_full)
- ScienceQA Full (scienceqa)
- ScienceQA IMG (scienceqa_img)
- ScreenSpot (screenspot)
- [ScreenSpot](https://github.com/njucckevin/SeeClick) (screenspot)
- ScreenSpot REC / Grounding (screenspot_rec)
- ScreenSpot REG / Instruction Generation (screenspot_reg)
- ST-VQA (stvqa)
- synthdog (synthdog)
- [ST-VQA](https://rrc.cvc.uab.es/?ch=11) (stvqa)
- [synthdog](https://github.com/clovaai/donut) (synthdog)
- synthdog English (synthdog_en)
- synthdog Chinese (synthdog_zh)
- TextCaps (textcaps)
- [TextCaps](https://textvqa.org/textcaps/) (textcaps)
- TextCaps Validation (textcaps_val)
- TextCaps Test (textcaps_test)
- TextVQA (textvqa)
- [TextVQA](https://textvqa.org/) (textvqa)
- TextVQA Validation (textvqa_val)
- TextVQA Test (textvqa_test)
- VCR-Wiki
- [VCR-Wiki](https://github.com/tianyu-z/VCR)
- VCR-Wiki English
- VCR-Wiki English easy 100 (vcr_wiki_en_easy_100)
- VCR-Wiki English easy 500 (vcr_wiki_en_easy_500)
Expand All @@ -139,64 +138,67 @@
- VCR-Wiki Chinese hard 100 (vcr_wiki_zh_hard_100)
- VCR-Wiki Chinese hard 500 (vcr_wiki_zh_hard_500)
- VCR-Wiki Chinese hard (vcr_wiki_zh_hard)
- VibeEval (vibe_eval)
- VizWizVQA (vizwiz_vqa)
- [VibeEval](https://github.com/reka-ai/reka-vibe-eval) (vibe_eval)
- [VizWizVQA](https://vizwiz.org/tasks-and-datasets/vqa/) (vizwiz_vqa)
- VizWizVQA Validation (vizwiz_vqa_val)
- VizWizVQA Test (vizwiz_vqa_test)
- VQAv2 (vqav2)
- [VQAv2](https://visualqa.org/) (vqav2)
- VQAv2 Validation (vqav2_val)
- VQAv2 Test (vqav2_test)
- WebSRC (websrc)
- [WebSRC](https://x-lance.github.io/WebSRC/) (websrc)
- WebSRC Validation (websrc_val)
- WebSRC Test (websrc_test)
- WildVision-Bench (wildvision)
- [WildVision-Bench](https://github.com/WildVision-AI/WildVision-Bench) (wildvision)
- WildVision 0617(wildvision_0617)
- WildVision 0630 (wildvision_0630)
- SeedBench 2 Plus (seedbench_2_plus)
- [SeedBench 2 Plus](https://huggingface.co/datasets/AILab-CVC/SEED-Bench-2-plus) (seedbench_2_plus)

## 2. Multi-image tasks:

- CMMMU (cmmmu)
- [CMMMU](https://cmmmu-benchmark.github.io/) (cmmmu)
- CMMMU Validation (cmmmu_val)
- CMMMU Test (cmmmu_test)
- HallusionBenchmark (hallusion_bench_image)
- ICON-QA (iconqa)
- [HallusionBench](https://github.com/tianyi-lab/HallusionBench) (hallusion_bench_image)
- [ICON-QA](https://iconqa.github.io/) (iconqa)
- ICON-QA Validation (iconqa_val)
- ICON-QA Test (iconqa_test)
- LLaVA-NeXT-Interleave-Bench (llava_interleave_bench)
- [LLaVA-NeXT-Interleave-Bench](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Interleave-Bench) (llava_interleave_bench)
- llava_interleave_bench_in_domain
- llava_interleave_bench_out_domain
- llava_interleave_bench_multi_view
- MIRB-hf (mirb)
- MMMU (mmmu)
- [MIRB](https://github.com/ys-zong/MIRB) (mirb)
- [MMMU](https://mmmu-benchmark.github.io/) (mmmu)
- MMMU Validation (mmmu_val)
- MMMU Test (mmmu_test)
- MMMU_Pro
- [MMMU_Pro](https://huggingface.co/datasets/MMMU/MMMU_Pro)
- MMMU Pro (mmmu_pro)
- MMMU Pro Original (mmmu_pro_original)
- MMMU Pro Vision (mmmu_pro_vision)
- MMMU Pro COT (mmmu_pro_cot)
- MMMU Pro Original COT (mmmu_pro_original_cot)
- MMMU Pro Vision COT (mmmu_pro_vision_cot)
- MMMU Pro Composite COT (mmmu_pro_composite_cot)
- MUIRBENCH (muirbench)
- Multi-DocVQA (multidocvqa)
- Multi-DocVQA Validation (multidocvqa_val)
- Multi-DocVQA Test (multidocvqa_test)
- OlympiadBench (olympiadbench)
- [MMT Multiple Image](https://mmt-bench.github.io/) (mmt_mi)
- MMT Multiple Image Validation (mmt_mi_val)
- MMT Multiple Image Test (mmt_mi_test)
- [MuirBench](https://muirbench.github.io/) (muirbench)
- [MP-DocVQA](https://github.com/rubenpt91/MP-DocVQA-Framework) (multidocvqa)
- MP-DocVQA Validation (multidocvqa_val)
- MP-DocVQA Test (multidocvqa_test)
- [OlympiadBench](https://github.com/OpenBMB/OlympiadBench) (olympiadbench)
- OlympiadBench Test English (olympiadbench_test_en)
- OlympiadBench Test Chinese (olympiadbench_test_cn)
- Q-Bench (qbenchs_dev)
- [Q-Bench](https://q-future.github.io/Q-Bench/) (qbenchs_dev)
- Q-Bench2-HF (qbench2_dev)
- Q-Bench-HF (qbench_dev)
- A-Bench-HF (abench_dev)

## 3. Videos tasks:

- ActivityNet-QA (activitynetqa_generation)
- SeedBench (seedbench)
- SeedBench 2 (seedbench_2)
- CVRR-ES (cvrr)
- [ActivityNet-QA](https://github.com/MILVLG/activitynet-qa) (activitynetqa_generation)
- [SeedBench](https://github.com/AILab-CVC/SEED-Bench) (seedbench)
- [SeedBench 2](https://github.com/AILab-CVC/SEED-Bench) (seedbench_2)
- [CVRR-ES](https://github.com/mbzuai-oryx/CVRR-Evaluation-Suite) (cvrr)
- cvrr_continuity_and_object_instance_count
- cvrr_fine_grained_action_understanding
- cvrr_interpretation_of_social_context
Expand All @@ -208,13 +210,16 @@
- cvrr_time_order_understanding
- cvrr_understanding_emotional_context
- cvrr_unusual_and_physically_anomalous_activities
- EgoSchema (egoschema)
- [EgoSchema](https://github.com/egoschema/EgoSchema) (egoschema)
- egoschema_mcppl
- egoschema_subset_mcppl
- egoschema_subset
- LongVideoBench
- MLVU (mlvu)
- MVBench (mvbench)
- [LongVideoBench](https://github.com/longvideobench/LongVideoBench)
- [MLVU](https://github.com/JUNJIE99/MLVU) (mlvu)
- [MMT-Bench](https://mmt-bench.github.io/) (mmt)
- MMT Validation (mmt_val)
- MMT Test (mmt_test)
- [MVBench](https://github.com/OpenGVLab/Ask-Anything/blob/main/video_chat2/MVBENCH.md) (mvbench)

- mvbench_action_sequence
- mvbench_moving_count
Expand All @@ -237,13 +242,13 @@
- mvbench_moving_attribute
- mvbench_egocentric_navigation

- NExTQA (nextqa)
- [NExT-QA](https://github.com/doc-doc/NExT-QA) (nextqa)

- NExTQA Multiple Choice Test (nextqa_mc_test)
- NExTQA Open Ended Validation (nextqa_oe_val)
- NExTQA Open Ended Test (nextqa_oe_test)
- NExT-QA Multiple Choice Test (nextqa_mc_test)
- NExT-QA Open Ended Validation (nextqa_oe_val)
- NExT-QA Open Ended Test (nextqa_oe_test)

- PerceptionTest
- [PerceptionTest](https://github.com/google-deepmind/perception_test)

- PerceptionTest Test
- perceptiontest_test_mc
Expand All @@ -252,25 +257,25 @@
- perceptiontest_val_mc
- perceptiontest_val_mcppl

- TempCompass (tempcompass)
- [TempCompass](https://github.com/llyx97/TempCompass) (tempcompass)

- tempcompass_multi_choice
- tempcompass_yes_no
- tempcompass_caption_matching
- tempcompass_captioning

- Vatex (vatex)
- [Vatex](https://eric-xw.github.io/vatex-website/index.html) (vatex)

- Vatex Chinese (vatex_val_zh)
- Vatex Test (vatex_test)

- VideoDetailDescription (video_dc499)
- VideoChatGPT (videochatgpt)
- VideoChatGPT Generic (videochatgpt_gen)
- VideoChatGPT Temporal (videochatgpt_temporal)
- VideoChatGPT Consistency (videochatgpt_consistency)
- Video-MME (videomme)
- VITATECS (vitatecs)
- [VideoDetailDescription](https://huggingface.co/datasets/lmms-lab/VideoDetailCaption) (video_dc499)
- [Video-ChatGPT](https://github.com/mbzuai-oryx/Video-ChatGPT) (videochatgpt)
- Video-ChatGPT Generic (videochatgpt_gen)
- Video-ChatGPT Temporal (videochatgpt_temporal)
- Video-ChatGPT Consistency (videochatgpt_consistency)
- [Video-MME](https://video-mme.github.io/) (videomme)
- [VITATECS](https://github.com/lscpku/VITATECS) (vitatecs)

- VITATECS Direction (vitatecs_direction)
- VITATECS Intensity (vitatecs_intensity)
Expand All @@ -279,9 +284,17 @@
- VITATECS Localization (vitatecs_localization)
- VITATECS Type (vitatecs_type)

- WorldQA (worldqa)
- [WorldQA](https://zhangyuanhan-ai.github.io/WorldQA/) (worldqa)

- WorldQA Generation (worldqa_gen)
- WorldQA Multiple Choice (worldqa_mc)

- YouCook2 (youcook2_val)
- [YouCook2](http://youcook2.eecs.umich.edu/) (youcook2_val)

## 4. Text Tasks

- [GSM8K](https://github.com/openai/grade-school-math) (gsm8k)
- [HellaSwag](https://rowanzellers.com/hellaswag/) (hellaswag)
- [IFEval](https://github.com/google-research/google-research/tree/master/instruction_following_eval) (ifeval)
- [MMLU](https://github.com/hendrycks/test) (mmlu)
- [MMLU_pro](https://github.com/TIGER-AI-Lab/MMLU-Pro) (mmlu_pro)

0 comments on commit c25d2d5

Please sign in to comment.