Stars
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
Multimodal Whole Slide Foundation Model for Pathology
This repo contains the code for 1D tokenizer and generator
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
[NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Towards a general-purpose foundation model for computational pathology - Nature Medicine
A vision-language foundation model for computational pathology - Nature Medicine
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Code associated to the publication: Scaling self-supervised learning for histopathology with masked image modeling, A. Filiot et al., MedRxiv (2023). We publicly release Phikon 🚀
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
EVE Series: Encoder-Free Vision-Language Models from BAAI
✨✨Latest Advances on Multimodal Large Language Models
Reasoning with Language Model is Planning with World Model
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
A method to increase the speed and lower the memory footprint of existing vision transformers.
A curated list of awesome papers on dataset distillation and related applications.
Stanford NLP Python library for Representation Finetuning (ReFT)
Fast and memory-efficient exact attention
🚀 Efficient implementations of state-of-the-art linear attention models in Torch and Triton