Skip to content

Latest commit

 

History

History
21 lines (21 loc) · 2.6 KB

Model_cards.md

File metadata and controls

21 lines (21 loc) · 2.6 KB
Models MLLM Architecture GitHub Stars Huggingface Download
LLaVA-v1.5-13B Pretrained Vision Encoder + Projector + LLM 15.4K 333.7K
LVIS-Instruct4v-LLaVA-7B Pretrained Vision Encoder + Projector + LLM 122 5
MiniGPT-v2 Pretrained Vision Encoder + Projector + LLM 24.7K /
LLaVA-v1.5-7B Pretrained Vision Encoder + Projector + LLM 15.4K 703K
LLaVA-v1.6-Vicuna-7B Pretrained Vision Encoder + Projector + LLM 15.4K 1.2M
LLaVA-v1.6-Vicuna-13B Pretrained Vision Encoder + Projector + LLM 15.4K 100.1K
LLaVA-v1.6-34B Pretrained Vision Encoder + Projector + LLM 15.4K 592.8K
Yi-VL-6B Pretrained Vision Encoder + Projector + LLM 7K 17.2K
ALLaVA Pretrained Vision Encoder + Projector + LLM 134 93
kosmos2 Pretrained Vision Encoder + Grounded LLM 18.1K 29.2K
LWM Pretrained Vision Encoder + Projector + Long-Context LLM 6.6K /
BLIP2-Flan-T5-XL Query tokens + LM 8.5K 35.4K
Qwen-Vl-Chat Query tokens + LLM 3.4K 289.9K
InstructBLIP-Vicuna-13B Query tokens + LLM 8.5K 5.4K
mPLUG-Owl2 Query tokens + LLM with Modality-Adaptive Module 1.9K 9.7K
Cheetor Query tokens + VPG-C + LLM 308 /
Fuyu-8B Linear Vision Encoder + LLM / 17.9K
SEED-LLaMA VQ-based Vision Encoder + LLM 445 /
OpenFlamingo Perceiver Resampler + LLM with Gated Cross-Attention Layers 3.4K /