From 4c08f827570e3abcfde1c5487b167b5de1b5fadb Mon Sep 17 00:00:00 2001 From: PromptExpert <32960135+PromptExpert@users.noreply.github.com> Date: Mon, 30 Sep 2024 20:27:07 +0800 Subject: [PATCH] Update mllm_papers.md --- docs/mllm/mllm_papers.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/mllm/mllm_papers.md b/docs/mllm/mllm_papers.md index c65305f..894730c 100644 --- a/docs/mllm/mllm_papers.md +++ b/docs/mllm/mllm_papers.md @@ -5,6 +5,7 @@ - 2024.09 [Molmo and PixMo:Open Weights and Open Data for State-of-the-Art Multimodal Models](https://www.arxiv.org/abs/2409.17146) Allen出品,同时开源模型和数据。 - 2024.09 [MIO: A Foundation Model on Multimodal Tokens](https://arxiv.org/abs/2409.17692) - 2024.09 [Phantom of Latent for Large Language and Vision Models](https://arxiv.org/abs/2409.14713) +- 2024.09 [Qwen2-VL: Enhancing Vision-Language Model’s Perception of the World at Any Resolution](https://arxiv.org/pdf/2409.12191) - 2024.09 [Llama 3.2: Revolutionizing edge AI and vision with open, customizable models](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/) - 2024.09 [NVLM: Open Frontier-Class Multimodal LLMs](https://arxiv.org/pdf/2409.11402) - 2024.09 [Viper: Open Mamba-based Vision-Language Models](https://github.com/EvanZhuang/viper/tree/main) 首个基于Mamba的VLM系列