From 610a8827dab5db048ca049f24b1e787548e7b676 Mon Sep 17 00:00:00 2001 From: Haonan Zhang Date: Wed, 13 Nov 2024 20:10:43 +0800 Subject: [PATCH] Update README.md --- mmevol/README.md | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/mmevol/README.md b/mmevol/README.md index 8b137891..6e18b7f4 100644 --- a/mmevol/README.md +++ b/mmevol/README.md @@ -1 +1,56 @@ +

+
+ +

+ +# MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct + +This is the official data collection of the paper "MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct", the dataset and checkpoint will be released soon. + +We are continuously refactoring our code, be patient and wait for the latest updates! + +## ๐Ÿ”— Links +- Project Web: https://mmevol.github.io/ +- Arxiv: https://arxiv.org/pdf/2409.05840 +- Dataset: https://huggingface.co/datasets/Tongyi-ConvAI/MMEvol/tree/main + +## ๐Ÿงช Dataset Details + +The Tongyi-ConvAI generates this dataset for multi-modal supervised fine-tuning. This dataset was used to train **Evol-Llama3-8B-Instruct** and **Evol-Qwen2-7B** reported in [our paper](https://arxiv.org/pdf/2409.05840). + +To create this dataset, we first selected 163K Seed Instruction Tuning Dataset for Evol-Instruct, then we enhance data quality through an iterative process that involves a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution. This process results in the generation of a more complex and diverse image-text instruction dataset, which in turn empowers MLLMs with enhanced capabilities. + +Below we showcase the detailed data distribution of the SEED-163K, which is prepared for multi-round evolution mentioned above: + +

+
+ Fig. 2. SEED-163K: 163K Curated Seed Instruction Tuning Dataset for Evol-Instruct +

+ +## Usage +We provide the example data for mmevol, which can be get with following steps: +1. Download example [data.zip](http://alibaba-research.oss-cn-beijing.aliyuncs.com/mmevol/datasets.zip) and unzip it. +2. Place the unzipped data folder in the `mmevol_sft_data` directory of the project. + + +**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) and [Gemma License](https://www.kaggle.com/models/google/gemma/license/). + +## ๐Ÿ“š Citation + +```bibtex +@article{xu2024magpie, + title={MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct}, + author={Run Luo, Haonan Zhang, Longze Chen, Ting-En Lin, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li}, + year={2024}, + eprint={2409.05840}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} +``` + +**Contact**: + +- Run Luo โ€” r.luo@siat.ac.cn + +- Haonan Zhang โ€” zchiowal@gmail.com