diff --git a/.DS_Store b/.DS_Store index 69379217..9f867ab4 100644 Binary files a/.DS_Store and b/.DS_Store differ diff --git a/mmevol/README.md b/mmevol/README.md index 9202f1af..f7c8b1a6 100644 --- a/mmevol/README.md +++ b/mmevol/README.md @@ -1,6 +1,9 @@ # MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct +

+ +


@@ -23,8 +26,6 @@ Jingkuan Song4๐ŸŒŸ,
- - \* Equal contribution ๐ŸŒŸ Corresponding author 1 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
@@ -38,12 +39,9 @@
-

- -

[[๐Ÿ“– arXiv Paper](https://arxiv.org/pdf/2409.05840)] [[๐Ÿ“Š Dataset](https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol)] [[๐Ÿ† Models](https://huggingface.co/models/Tongyi-ConvAI/MMEvol)]
-MMEvol is the first method that successfully introduces Evol-Instruct into multimodal domain to improve the diversity and complexity of multimodal instruction data. Compared with previous methods like vila2, MIMIC-IT, and MMInstruct, it can perform iterative evolution in a very elegant and simple way in a fully automatic way, breaking through human imagination of data complexity and diversity. It has no restrictions on the form of data, the type of task, or complex processing. It can quickly perform self-iterative evolution on limited image instruction data to obtain ultra-high-quality multimodal data, thereby giving multimodal models more powerful capabilities. At the same time, it can be orthogonally combined with other data flow-driven methods such as vila2, MIMIC-IT, and MMInstruct to obtain more powerful data construction effects. Everyone is welcome to experience it now! +MMEvol is the first method that successfully introduces Evol-Instruct into multimodal domain to improve the diversity and complexity of multimodal instruction data. Compared with previous methods like VILA2, MIMIC-IT, and MMInstruct, it can perform iterative evolution in a very elegant and simple way in a fully automatic way, breaking through human imagination of data complexity and diversity. It has no restrictions on the form of data, the type of task, or complex processing. It can quickly perform self-iterative evolution on limited image instruction data to obtain ultra-high-quality multimodal data, thereby giving multimodal models more powerful capabilities. At the same time, it can be orthogonally combined with other data flow-driven methods such as VILA2, MIMIC-IT, and MMInstruct to obtain more powerful data construction effects. Everyone is welcome to experience it now! ## ๐Ÿ”ฅ Update @@ -103,8 +101,8 @@ Here are the pretrained weights and instruction tuning weights | Model | Pretrained Projector | Base LLM | PT Data | IT Data | Download | | ---------------- | -------------------- | --------- | ------------------------------------------------------------ | ------- | -------- | -| MMEvol-Qwen2-7B | [mm_projector]() | Qwen2-7B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt]() | -| MMEvol-LLaMA3-8B | [mm_projector]() | LLaMA3-8B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt]() | +| MMEvol-Qwen2-7B | [mm_projector](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) | Qwen2-7B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) | +| MMEvol-LLaMA3-8B | [mm_projector](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) | LLaMA3-8B | [LLaVA-Pretrain](https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain) | MMEvol | [ckpt](https://huggingface.co/models/Tongyi-ConvAI/MMEvol) | ### Performance @@ -255,9 +253,10 @@ bash scripts/v1_6/train/llama3/finetune.sh bash scripts/v1_6/train/qwen2/finetune.sh ``` - ## ๐Ÿ“ˆ Evaluation +#### Ensure that your api_base and key are correctly configured before evaluation. + ## opencompass First, enter the `vlmevalkit` directory and install all dependencies: @@ -313,6 +312,8 @@ While scoring on each benchmark directly, set `MODE=all`. If only inference resu ./script/run_inference.sh MMEvol-Llama3-V-1_6 MathVista_MINI all ..... +# NOTE you should use llava/eval/blink_eval.py for blink evaluation individually. +python llava/eval/blink_eval.py ```
@@ -335,22 +336,24 @@ python llava/eval/mminst_eval.py
+ + ## ๐Ÿ‘€ Visualization The Tongyi-ConvAI generates this dataset for multi-modal supervised fine-tuning. This dataset was used to train **Evol-Llama3-8B-Instruct** and **Evol-Qwen2-7B** reported in [our paper](https://arxiv.org/pdf/2409.05840). To create this dataset, we first selected 163K Seed Instruction Tuning Dataset for Evol-Instruct, then we enhance data quality through an iterative process that involves a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution. This process results in the generation of a more complex and diverse image-text instruction dataset, which in turn empowers MLLMs with enhanced capabilities. Below we showcase the detailed data distribution of the SEED-163K, which is prepared for multi-round evolution mentioned above. More details can be found in our paper.
- +
Click to expand more examples

- - - - + + + +

diff --git a/mmevol/dataengine/README.md b/mmevol/dataengine/README.md new file mode 100644 index 00000000..0aeb4246 --- /dev/null +++ b/mmevol/dataengine/README.md @@ -0,0 +1,79 @@ +# Data construction pipeline for MMEvol-480k. + +

+ +

+ +
+
+
Run Luo1,2*, +Haonan Zhang3*, +Longze Chen1,2*, +Ting-En Lin3*, +Xiong Liu3, +Yuchuan Wu3, +Min Yang1,2๐ŸŒŸ, +Yongbin Li3๐ŸŒŸ, +
+Minzheng Wang2, +Pengpeng Zeng4, +Lianli Gao5, +Heng Tao Shen4, +Yunshui Li1,2, +Xiaobo Xia6, +FeiHuang3, +Jingkuan Song4๐ŸŒŸ, +
+ +\* Equal contribution ๐ŸŒŸ Corresponding author + +1 Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
+2 University of Chinese Academy of Sciences
+3 Alibaba Group +4 Tongji University +5 Independent Researcher +6 The University of Sydney
+ +![Multi-Modal](https://img.shields.io/badge/Task-Multi--Modal-red)
+ +
+ + +
[[๐Ÿ“– arXiv Paper](https://arxiv.org/pdf/2409.05840)] [[๐Ÿ“Š Dataset](https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol)] [[๐Ÿ† Models](https://huggingface.co/models/Tongyi-ConvAI/MMEvol)]
+ +Follow the instructions below to generate MMEvol-480k. + +1. Download SEED-163k json file (`mm_seed_no_evo_163k.json`) from [๐Ÿค— huggingface](https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol/tree/main/jsons), and place it under the `./dataengine/datasets` path. +2. Execute preprocessing code under `dataengine/datasets` path to extract each sample to the `meta_data` folder by: +```python +python dataengine/datasets/process.py +``` +3. Prepare the data storage folder by referring to the format of `./dataengine/evolution/folder_template`, you can just copy folder_template and name it as your data name as you like, _e.g._, mmevol_1k_evo.json. +4. Ensure that your `api_base` and `key` are correctly configured before starting generation. You should put your key and api_base on both: + +- lines 129-130 in dataengine/multi_round.py +- lines 126-127 in dataengine/score_process/difficulty_scoring_v123.py +5. Run the following code to begin the three-round data evolution: +```python +python dataengine/multi_round.py +``` +Three rounds of evolution will be performed based on the SEED-163k, and data filtering will be performed at the end of each round of evolution. The final evolution data will be stored under `./datasets` paths + +**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and [Gemma License](https://www.kaggle.com/models/google/gemma/license/). + +## ๐Ÿ“š Citation + +```bibtex +@article{luo2024mmevol, + title={Mmevol: Empowering multimodal large language models with evol-instruct}, + author={Luo, Run and Zhang, Haonan and Chen, Longze and Lin, Ting-En and Liu, Xiong and Wu, Yuchuan and Yang, Min and Wang, Minzheng and Zeng, Pengpeng and Gao, Lianli and others}, + journal={arXiv preprint arXiv:2409.05840}, + year={2024} +} +``` + +**Contact**: + +- Run Luo โ€” r.luo@siat.ac.cn + +- Haonan Zhang โ€” zchiowal@gmail.com diff --git a/mmevol/dataengine/assets/mmevol_dis_cam.png b/mmevol/dataengine/assets/mmevol_dis_cam.png new file mode 100644 index 00000000..d63f4277 Binary files /dev/null and b/mmevol/dataengine/assets/mmevol_dis_cam.png differ diff --git a/mmevol/dataengine/assets/mmevol_logo.png b/mmevol/dataengine/assets/mmevol_logo.png new file mode 100644 index 00000000..76ec2126 Binary files /dev/null and b/mmevol/dataengine/assets/mmevol_logo.png differ diff --git a/mmevol/dataengine/assets/mmevol_long_tail.png b/mmevol/dataengine/assets/mmevol_long_tail.png new file mode 100644 index 00000000..30e96b2e Binary files /dev/null and b/mmevol/dataengine/assets/mmevol_long_tail.png differ diff --git a/mmevol/dataengine/assets/mmevol_pai.png b/mmevol/dataengine/assets/mmevol_pai.png new file mode 100644 index 00000000..e1070bd6 Binary files /dev/null and b/mmevol/dataengine/assets/mmevol_pai.png differ diff --git a/mmevol/dataengine/assets/mmevol_performance.png b/mmevol/dataengine/assets/mmevol_performance.png new file mode 100644 index 00000000..a2795f93 Binary files /dev/null and b/mmevol/dataengine/assets/mmevol_performance.png differ diff --git a/mmevol/mmevol_sft_data/assets/seed_dis.jpg b/mmevol/dataengine/assets/mmevol_seed_dis.jpg similarity index 100% rename from mmevol/mmevol_sft_data/assets/seed_dis.jpg rename to mmevol/dataengine/assets/mmevol_seed_dis.jpg diff --git a/mmevol/mmevol_sft_data/base.py b/mmevol/dataengine/base.py similarity index 100% rename from mmevol/mmevol_sft_data/base.py rename to mmevol/dataengine/base.py diff --git a/mmevol/dataengine/datasets/process.py b/mmevol/dataengine/datasets/process.py new file mode 100644 index 00000000..866583af --- /dev/null +++ b/mmevol/dataengine/datasets/process.py @@ -0,0 +1,24 @@ +import json +import os +import os.path as osp +from tqdm import tqdm +import shutil + +# Construct hash_id to create a unique index, because both id and image key values โ€‹โ€‹have duplicate values +datasets_path = "/mnt/data/haonan/code/dataengine/datasets" + +a = json.load(open(osp.join(datasets_path, "seed_data_1k_demo.json"), "r")) +for index, i in enumerate(a): + i["hash_id"] = str(index) + "_" + i["image"].replace("/", "_") + +json.dump(a, open("/mnt/data/haonan/code/dataengine/datasets/seed_data_1k_demo.json", "w"), indent=4) + +# If the data format is already well organized, store it separately in meta data +if os.path.exists(osp.join(datasets_path, "meta_data")): + shutil.rmtree(osp.join(datasets_path, "meta_data")) + os.mkdir(osp.join(datasets_path, "meta_data")) + +data = json.load(open(osp.join(datasets_path, "seed_data_1k_demo.json"), "r")) + +for index, d in enumerate(tqdm(data)): + json.dump(d, open(osp.join(datasets_path, "meta_data", "{}.json".format(d["hash_id"])), "w"), indent=4) \ No newline at end of file diff --git a/mmevol/mmevol_sft_data/multi_round.py b/mmevol/dataengine/multi_round.py similarity index 98% rename from mmevol/mmevol_sft_data/multi_round.py rename to mmevol/dataengine/multi_round.py index bde792a4..793f62c1 100644 --- a/mmevol/mmevol_sft_data/multi_round.py +++ b/mmevol/dataengine/multi_round.py @@ -1,6 +1,6 @@ import os import sys -sys.path.append("/mnt/data/haonan/code/mmevol_sft_data") +sys.path.append("/mnt/data/haonan/code/dataengine") from base import BaseAPI import numpy as np from tqdm import tqdm @@ -466,13 +466,13 @@ def filter_round3(meta_data, conversation_v3_path): if __name__=='__main__': - final_save_path = "/mnt/data/haonan/code/mmevol_sft_data/datasets/seed_data_1k_demo_evo.json" - root_path = '/mnt/data/haonan/code/mmevol_sft_data/evolution/multi_round_single_imgs_1k_mini' + final_save_path = "/mnt/data/haonan/code/dataengine/datasets/seed_data_1k_demo_evo.json" + root_path = '/mnt/data/haonan/code/dataengine/evolution/multi_round_single_imgs_1k_mini' img_path = '/mnt/workspace/lr/datasets' for round_n in [1,2,3]: if round_n == 1: - seed_data_path = "/mnt/data/haonan/code/mmevol_sft_data/datasets/meta_data" + seed_data_path = "/mnt/data/haonan/code/dataengine/datasets/meta_data" else: seed_data_path = osp.join(root_path, "round{}".format(round_n-1), "filtered_qa") @@ -534,4 +534,4 @@ def filter_round3(meta_data, conversation_v3_path): merged_data.append(data) json.dump(merged_data, open(final_save_path, "w"), indent=4) - print("Saveing file to {}".format(final_save_path)) + print("Saveing file to {}".format(final_save_path)) \ No newline at end of file diff --git a/mmevol/mmevol_sft_data/prompt.py b/mmevol/dataengine/prompt.py similarity index 100% rename from mmevol/mmevol_sft_data/prompt.py rename to mmevol/dataengine/prompt.py diff --git a/mmevol/mmevol_sft_data/score_process/base.py b/mmevol/dataengine/score_process/base.py similarity index 100% rename from mmevol/mmevol_sft_data/score_process/base.py rename to mmevol/dataengine/score_process/base.py diff --git a/mmevol/mmevol_sft_data/score_process/difficulty_scoring_v0.py b/mmevol/dataengine/score_process/difficulty_scoring_v0.py similarity index 92% rename from mmevol/mmevol_sft_data/score_process/difficulty_scoring_v0.py rename to mmevol/dataengine/score_process/difficulty_scoring_v0.py index 98e47f04..e38cefc6 100644 --- a/mmevol/mmevol_sft_data/score_process/difficulty_scoring_v0.py +++ b/mmevol/dataengine/score_process/difficulty_scoring_v0.py @@ -124,12 +124,9 @@ def __init__(self, print('Unknown API Base. ') sys.exit(-1) - self.api_base="http://47.88.8.18:8088/api/ask" - # self.api_base = "http://47.88.8.18:8088/api/ask?tenant=gpt-4o-mini" - # self.key = "eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjI1ODczMCIsInBhc3N3b3JkIjoiMjU4NzMwMTIzIiwiZXhwIjoyMDE5NTUwNzAxfQ.JuqnTa7yauGkSzWkBiEig1K_rxvfAYTXS9F9_m-h4q8" - # self.key = "eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjI3NDM2OCIsInBhc3N3b3JkIjoiMjc0MzY4MTIzIiwiZXhwIjoyMDEyNjEzNjA4fQ.7OUpHs-AFPaFHuUy_p7XxXyNYhca2_-7F5GBtaahfe4" - self.key = "eyJhbGciOiJIUzI1NiIsInR5cCI6Imp3dCJ9.eyJ1c2VybmFtZSI6IjQ0MzQ1NSIsInBhc3N3b3JkIjoiNDQzNDU1MTIzIiwiZXhwIjoyMDMxNzA1NTA3fQ.7g4a6t9dKcRXVRa7MwQb5m2oirFu1OxjXhWbNM0w50s" - # self.key = "eyJhbGciOiJIUzI1NiIsInR5cCI6Imp3dCJ9.eyJ1c2VybmFtZSI6IjQzOTg2OSIsInBhc3N3b3JkIjoiNDM5ODY5MTIzIiwiZXhwIjoyMDMxNzA3NjkzfQ.ly9XNzVW7pEeW_bTZxzaqB3jt2kRr14XQIpT0DbCTto" + self.api_base = "" + self.key = "" + # self.model = "gpt-4o-2024-08-06" self.model = "gpt-4o-mini" diff --git a/mmevol/mmevol_sft_data/score_process/difficulty_scoring_v123.py b/mmevol/dataengine/score_process/difficulty_scoring_v123.py similarity index 95% rename from mmevol/mmevol_sft_data/score_process/difficulty_scoring_v123.py rename to mmevol/dataengine/score_process/difficulty_scoring_v123.py index 75536e29..09fdb208 100644 --- a/mmevol/mmevol_sft_data/score_process/difficulty_scoring_v123.py +++ b/mmevol/dataengine/score_process/difficulty_scoring_v123.py @@ -123,10 +123,9 @@ def __init__(self, print('Unknown API Base. ') sys.exit(-1) - self.api_base="http://47.88.8.18:8088/api/ask" - # self.api_base = "http://47.88.8.18:8088/api/ask?tenant=gpt-4o-mini" - # self.key = "eyJ0eXAiOiJqd3QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjI1ODczMCIsInBhc3N3b3JkIjoiMjU4NzMwMTIzIiwiZXhwIjoyMDE5NTUwNzAxfQ.JuqnTa7yauGkSzWkBiEig1K_rxvfAYTXS9F9_m-h4q8" - self.key = "eyJhbGciOiJIUzI1NiIsInR5cCI6Imp3dCJ9.eyJ1c2VybmFtZSI6IjQ0MzQ1NSIsInBhc3N3b3JkIjoiNDQzNDU1MTIzIiwiZXhwIjoyMDMxNzA1NTA3fQ.7g4a6t9dKcRXVRa7MwQb5m2oirFu1OxjXhWbNM0w50s" + self.api_base = "" + self.key = "" + # self.model="gpt-4o-2024-05-13" self.model = "gpt-4o-mini" diff --git a/mmevol/mmevol_sft_data/score_process/prompt_score.py b/mmevol/dataengine/score_process/prompt_score.py similarity index 100% rename from mmevol/mmevol_sft_data/score_process/prompt_score.py rename to mmevol/dataengine/score_process/prompt_score.py diff --git a/mmevol/mmevol_sft_data/utils/a.ipynb b/mmevol/dataengine/utils/a.ipynb similarity index 100% rename from mmevol/mmevol_sft_data/utils/a.ipynb rename to mmevol/dataengine/utils/a.ipynb diff --git a/mmevol/mmevol_sft_data/utils/bertopic.ipynb b/mmevol/dataengine/utils/bertopic.ipynb similarity index 100% rename from mmevol/mmevol_sft_data/utils/bertopic.ipynb rename to mmevol/dataengine/utils/bertopic.ipynb diff --git a/mmevol/mmevol_sft_data/utils/coco_80_labels.txt b/mmevol/dataengine/utils/coco_80_labels.txt similarity index 100% rename from mmevol/mmevol_sft_data/utils/coco_80_labels.txt rename to mmevol/dataengine/utils/coco_80_labels.txt diff --git a/mmevol/mmevol_sft_data/utils/data_process.py b/mmevol/dataengine/utils/data_process.py similarity index 100% rename from mmevol/mmevol_sft_data/utils/data_process.py rename to mmevol/dataengine/utils/data_process.py diff --git a/mmevol/mmevol_sft_data/utils/object_count.json b/mmevol/dataengine/utils/object_count.json similarity index 100% rename from mmevol/mmevol_sft_data/utils/object_count.json rename to mmevol/dataengine/utils/object_count.json diff --git a/mmevol/mmevol_sft_data/utils/small_obj.txt b/mmevol/dataengine/utils/small_obj.txt similarity index 100% rename from mmevol/mmevol_sft_data/utils/small_obj.txt rename to mmevol/dataengine/utils/small_obj.txt diff --git a/mmevol/mmevol_sft_data/utils/small_obj_process.txt b/mmevol/dataengine/utils/small_obj_process.txt similarity index 100% rename from mmevol/mmevol_sft_data/utils/small_obj_process.txt rename to mmevol/dataengine/utils/small_obj_process.txt diff --git a/mmevol/llava/eval/mmvp_eval.py b/mmevol/llava/eval/mmvp_eval.py index 7b9967a1..734ce24f 100644 --- a/mmevol/llava/eval/mmvp_eval.py +++ b/mmevol/llava/eval/mmvp_eval.py @@ -109,11 +109,12 @@ def make_request(meta): with Pool(processes=50) as pool: output = list(tqdm(pool.imap(make_request, data), total=len(data))) -print(output) -for i in set(all_types): +# print(output) +# for i in set(all_types): - for j in data: - if j['type']==i +# for j in data: +# if j['type']==i + num_correct, num_total = 0, 0 # Continue with the processing of the JSONL file index=0 diff --git a/mmevol/mmevol_sft_data/README.md b/mmevol/mmevol_sft_data/README.md deleted file mode 100644 index c5dd04e5..00000000 --- a/mmevol/mmevol_sft_data/README.md +++ /dev/null @@ -1,51 +0,0 @@ -

-
- -

- -# MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct - -This is the official data collection of the paper "MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct", the dataset and checkpoint will be released soon. - -We are continuously refactoring our code, be patient and wait for the latest updates! - -## ๐Ÿ”— Links -- Project Web: https://mmevol.github.io/ - -- Arxiv Paper: https://arxiv.org/pdf/2409.05840 - -- Code: Coming soon - -## ๐Ÿงช Dataset Details - -The Tongyi-ConvAI generates this dataset for multi-modal supervised fine-tuning. This dataset was used to train **Evol-Llama3-8B-Instruct** and **Evol-Qwen2-7B** reported in [our paper](https://arxiv.org/pdf/2409.05840). - -To create this dataset, we first selected 163K Seed Instruction Tuning Dataset for Evol-Instruct, then we enhance data quality through an iterative process that involves a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution. This process results in the generation of a more complex and diverse image-text instruction dataset, which in turn empowers MLLMs with enhanced capabilities. - -Below we showcase the detailed data distribution of the SEED-163K, which is prepared for multi-round evolution mentioned above: - -

-
- Fig. 2. SEED-163K: 163K Curated Seed Instruction Tuning Dataset for Evol-Instruct -

- - - -**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and [Gemma License](https://www.kaggle.com/models/google/gemma/license/). - -## ๐Ÿ“š Citation - -```bibtex -@article{luo2024mmevol, - title={Mmevol: Empowering multimodal large language models with evol-instruct}, - author={Luo, Run and Zhang, Haonan and Chen, Longze and Lin, Ting-En and Liu, Xiong and Wu, Yuchuan and Yang, Min and Wang, Minzheng and Zeng, Pengpeng and Gao, Lianli and others}, - journal={arXiv preprint arXiv:2409.05840}, - year={2024} -} -``` - -**Contact**: - -- Run Luo โ€” r.luo@siat.ac.cn - -- Haonan Zhang โ€” zchiowal@gmail.com diff --git a/mmevol/mmevol_sft_data/assets/mmevol.jpg b/mmevol/mmevol_sft_data/assets/mmevol.jpg deleted file mode 100644 index d280d886..00000000 Binary files a/mmevol/mmevol_sft_data/assets/mmevol.jpg and /dev/null differ diff --git a/mmevol/mmevol_sft_data/datasets/process.ipynb b/mmevol/mmevol_sft_data/datasets/process.ipynb deleted file mode 100644 index e1236f26..00000000 --- a/mmevol/mmevol_sft_data/datasets/process.ipynb +++ /dev/null @@ -1,65 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1612/1612 [00:15<00:00, 103.12it/s]\n" - ] - } - ], - "source": [ - "import json\n", - "import os\n", - "import os.path as osp\n", - "from tqdm import tqdm\n", - "import shutil\n", - "\n", - "# Construct hash_id to create a unique index, because both id and image key values โ€‹โ€‹have duplicate values\n", - "datasets_path = \"/mnt/data/haonan/code/mmevol_sft_data/datasets\"\n", - "\n", - "a = json.load(open(osp.join(datasets_path, \"seed_data_1k_demo.json\"), \"r\"))\n", - "for index, i in enumerate(a):\n", - " i[\"hash_id\"] = str(index) + \"_\" + i[\"image\"].replace(\"/\", \"_\")\n", - "\n", - "json.dump(a, open(\"/mnt/data/haonan/code/mmevol_sft_data/datasets/seed_data_1k_demo.json\", \"w\"), indent=4)\n", - "\n", - "# If the data format is already well organized, store it separately in meta data\n", - "if os.path.exists(osp.join(datasets_path, \"meta_data\")):\n", - " shutil.rmtree(osp.join(datasets_path, \"meta_data\"))\n", - " os.mkdir(osp.join(datasets_path, \"meta_data\"))\n", - "\n", - "data = json.load(open(osp.join(datasets_path, \"seed_data_1k_demo.json\"), \"r\"))\n", - "\n", - "for index, d in enumerate(tqdm(data)):\n", - " json.dump(d, open(osp.join(datasets_path, \"meta_data\", \"{}.json\".format(d[\"hash_id\"])), \"w\"), indent=4)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.14" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/mmevol/vlmevalkit/.DS_Store b/mmevol/vlmevalkit/.DS_Store deleted file mode 100644 index f9863cd7..00000000 Binary files a/mmevol/vlmevalkit/.DS_Store and /dev/null differ diff --git a/mmevol/vlmevalkit/vlmeval/.DS_Store b/mmevol/vlmevalkit/vlmeval/.DS_Store deleted file mode 100644 index 171a3172..00000000 Binary files a/mmevol/vlmevalkit/vlmeval/.DS_Store and /dev/null differ diff --git a/mmevol/vlmevalkit/vlmeval/api/gpt.py b/mmevol/vlmevalkit/vlmeval/api/gpt.py index 14f67c09..a88e8820 100644 --- a/mmevol/vlmevalkit/vlmeval/api/gpt.py +++ b/mmevol/vlmevalkit/vlmeval/api/gpt.py @@ -91,15 +91,14 @@ def __init__(self, else: self.logger.error('Unknown API Base. ') sys.exit(-1) + # your api_base - self.api_base="" + self.api_base = "" # your key - self.key="" + self.key = "" assert len(self.api_base)>0 and len(self.key)>0, "make sure tha both api_base and key are configured correctly" - - # self.model="gpt-4o-2024-05-13" model = "gpt-4o-mini" self.logger.info(f'Using API Base: {self.api_base}; API Key: {self.key}')