"facebook/opt-125m" and other sub 1B parameter LM's? #11

dcompgriff · 2025-01-16T18:48:29Z

I see that the code has options for using small OPT LM's. Will you release weights for VLM's trained with these smaller LM models?

zhangshaolei1998 · 2025-01-20T09:21:10Z

Thanks for your question.
The "small OPT LM" appears due to some settings in the original LLaVA code. We did not train the related LLaVA-Mini on smaller LMs.
If necessary, you can train an LLaVA-Mini based on smaller LMs, which will be more efficient.

dcompgriff · 2025-01-21T19:30:32Z

Where are the training and fine-tuning scripts you use. I see scripts/llavamini/train.sh, but no script for fine-tuning that includes the "During instruction tuning, we combine 665K image instruction data from LLaVA (Liu et al., 2023b), 100K video instruction data from Video-ChatGPT" that the paper mentions. Do you just start with 'scripts/pretrain.sh' as your stage 1 with your "ICTNLP/llava-mini-llama-3-8b" as the model path? It's not clear because the --vision_tower parameters don't match for those two files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"facebook/opt-125m" and other sub 1B parameter LM's? #11

"facebook/opt-125m" and other sub 1B parameter LM's? #11

dcompgriff commented Jan 16, 2025

zhangshaolei1998 commented Jan 20, 2025

dcompgriff commented Jan 21, 2025 •

edited

Loading

"facebook/opt-125m" and other sub 1B parameter LM's? #11

"facebook/opt-125m" and other sub 1B parameter LM's? #11

Comments

dcompgriff commented Jan 16, 2025

zhangshaolei1998 commented Jan 20, 2025

dcompgriff commented Jan 21, 2025 • edited Loading

dcompgriff commented Jan 21, 2025 •

edited

Loading