p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

Jun Zhang, Desen Meng, Ji Qi, Zhenpeng Huang, Tao Wu, and Limin Wang.

We present p-MoD, a series of efficient MLLMs which features:

✂️ Mixture-of-Depths mechanism, upgraded with tanh-gated weight normalization (TanhNorm) and symmetric token reweighting (STRing).
🎢 Progressive ratio decay (PRD) strategy, which gradually reduces the token retention ratio layer by layer.

📕 Performance and Efficiency

p-MoD matches or even surpasses the performance of the baseline models, with only 55.6% TFLOPs and 53.8% KV cache storage during inference, and 77.7% GPU hours during training.

🛠️ Requirements and Installation

Clone this repository and navigate to the folder

git clone https://github.com/MCG-NJU/p-MoD.git
cd p-MoD

Install packages

conda create -n p-mod python=3.10 -y
conda activate p-mod
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install -e lmms-eval

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation --no-cache-dir

Login to huggingface and wandb

huggingface-cli login
wandb login

🐯 Model Zoo

Model	LLM	Epoch	Pretrain Data	SFT Data
p-MoD-LLaVA-NeXT-7B	Vicuna-7B	1	558K	779K
p-MoD-LLaVA-v1.5-7B	Vicuna-7B	1	558K	665K

📊 Evaluation

We evaluate our model using lmms-eval. You can use our script ./scripts/lmms-eval/eval.sh, for example:

bash ./scripts/lmms-eval/eval.sh \
  --ckpt MCG-NJU/p-MoD-LLaVA-NeXT-7B \
  --eval_tasks ai2d,chartqa \
  --project_name pmod \
  --run_name pmod-llava-next-7b-ft

🚀 Train

Pretraining

We use the pretrained MLP projector provided by LLaVA, which can be downloaded here. Then put the downloaded model weights under ./checkpoints/llava-v1.5-7b-pretrain/llava-official-checkpoint.

p-MoD-LLaVA-NeXT

First, we provide our script ./util_scripts/download_llava-next_data.py for data preparation. This script downloads the 779K LLaVA-NeXT data, saves the images under ./playground/data/llava_next_images/ and data json to the path ./playground/data/llava_next_data.json.

Then you can start training using ./scripts/train/finetune_eval_7b_pmod_llava_next.sh.

p-MoD-LLaVA-1.5

First, prepare instruction tuning data following LLaVA-1.5. Download the images from constituting datasets, and the dataset annotation json llava_v1_5_mix_665k.json. Save the images and the json under ./playground/data.

Then, we fix some broken examples in the data json by running the script

python util_scripts/clean_data_json.py \
--original_json_path ./playground/data/llava_v1_5_mix665k.json \
--cleaned_json_path ././playground/data/llava_v1_5_mix665k_cleaned.json

Start training with ./scripts/train/finetune_eval_7b_pmod_llava_1_5.sh.

📄 Citation

If you find our work helpful for your research and applications, please cite our paper:

@article{zhang2024pmod,
  title={p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay},
  author={Zhang, Jun and Meng, Desen and Qi, Ji and Huang, Zhenpeng and Wu, Tao and Wang, Limin},
  journal={arXiv preprint arXiv:2412.04449},
  year={2024}
}

💫 Acknowledgement

LLaVA and LLaVA-NeXT: The codebases we built upon.
lmms-eval: We use this amazing framework for evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

📕 Performance and Efficiency

🛠️ Requirements and Installation

🐯 Model Zoo

📊 Evaluation

🚀 Train

Pretraining

p-MoD-LLaVA-NeXT

p-MoD-LLaVA-1.5

📄 Citation

💫 Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

📕 Performance and Efficiency

🛠️ Requirements and Installation

🐯 Model Zoo

📊 Evaluation

🚀 Train

Pretraining

p-MoD-LLaVA-NeXT

p-MoD-LLaVA-1.5

📄 Citation

💫 Acknowledgement