Skip to content

Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.

Notifications You must be signed in to change notification settings

EnnengYang/Awesome-Model-Merging-Methods-Theories-Applications

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Ā 
Ā 
Ā 
Ā 
Ā 

Repository files navigation

Awesome-Model-Merging-Methods-Theories-Applications

Awesome

A comprehensive list of papers about 'Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. Arxiv, 2024.'.

Important

If you have a relevant paper not included in the library, or have any clarification about the content of the paper, please contact us!


šŸ’„ News šŸ’„

  • šŸ”„šŸ”„šŸ”„ We marked the papers that used model size $\geq$ 7B in experiments.

Abstract

Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As model merging becomes increasingly prevalent across various fields, it is crucial to understand the available model merging techniques comprehensively. However, there is a significant gap in the literature regarding a systematic and thorough review of these techniques. To address this gap, this survey provides a comprehensive overview of model merging methods and theories, their applications in various domains and settings, and future research directions. Specifically, we first propose a new taxonomic approach that exhaustively discusses existing model merging methods. Secondly, we discuss the application of model merging techniques in large language models, multimodal large language models, and 10+ machine learning subfields, including continual learning, multi-task learning, few-shot learning, etc. Finally, we highlight the remaining challenges of model merging and discuss future research directions.

Model Merging

Citation

If you find our paper or this resource helpful, please consider cite:

@article{Survery_ModelMerging_2024,
  title={Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities},
  author={Yang, Enneng and Shen, Li and Guo, Guibing and Wang, Xingwei and Cao, Xiaochun and Zhang, Jie and Tao, Dacheng},
  journal={arXiv preprint arXiv:2408.07666},
  year={2024}
}

Thanks!


Framework


Survey

Paper Title Year Conference/Journal
SoK: On Finding Common Ground in Loss Landscapes Using Deep Model Merging Techniques 2024 Arxiv
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities 2024 Arxiv
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning 2024 Arxiv
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models 2024 Arxiv
Learn From Model Beyond Fine-Tuning: A Survey 2023 Arxiv
Deep Model Fusion: A Survey 2023 Arxiv

Benchmark/Evaluation

Paper Title Year Conference/Journal Remark
Rethinking Weight-Averaged Model-merging 2024 Arxiv
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models 2024 Arxiv LLaMA3-8B-Instruct, Qwen2-7B-Instruct, Mistral-7B-Instruct-v0.3,
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild 2024 NeurIPS Track on Datasets and Benchmarks Synthia-7B-v1.2, Llama-2-7b-evolcodealpaca, OpenHermes-7B, pygmalion-2-7b, Llama-2-7b-chat-hf, BeingWell_llama2_7b, MetaMath-7B-V1.0, vicuna-7b-v1.5, Platypus2-7B, GOAT-7B-Community, Llama-2-7b-WikiChat-fused, dolphin-llama2-7b, MetaMath-Llemma-7B, CodeLlama-7b-Instruct-hf, Magicoder-S-CL-7B , CrystalChat
What Matters for Model Merging at Scale? 2024 Arxiv PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B)
Realistic Evaluation of Model Merging for Compositional Generalization 2024 Arxiv
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities 2024 Arxiv Llama-3.1-8B, Mistral-7B-v0.3
FusionBench: A Comprehensive Benchmark of Deep Model Fusion 2024 Arxiv
Arcee's MergeKit: A Toolkit for Merging Large Language Models 2024 Arxiv Llama2-7B-Chat, Meditron-7B

Advanced Methods

Model Merging

Pre-Merging Methods

Linearization Fine-tuning

Paper Title Year Conference/Journal Remark
Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic 2024 Arxiv
Tangent Transformers for Composition,Privacy and Removal 2024 ICLR
Parameter Efficient Multi-task Model Fusion with Partial Linearization 2024 ICLR
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models 2023 NeurIPS

Sparse Fine-tuning

Paper Title Year Conference/Journal Remark
Efficient Model Editing with Task-Localized Sparse Fine-tuning 2024

Architecture Transformation

Paper Title Year Conference/Journal Remark
Knowledge fusion of large language models 2024 ICLR Llama-2 7B, OpenLLaMA 7B, MPT 7B
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report 2024 Arxiv NH2-Mixtral-8x7B, NH2-Solar-10.7B, and OpenChat-3.5-7B
On Cross-Layer Alignment for Model Fusion of Heterogeneous Neural Networks 2023 ICASSP
GAN Cocktail: mixing GANs without dataset access 2022 ECCV

Weight Alignment

Paper Title Year Conference/Journal Remark
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse 2024 Arxiv
Equivariant Deep Weight Space Alignment 2024 ICML
Harmony in diversity: Merging neural networks with canonical correlation analysis 2024 ICML
Transformer fusion with optimal transport 2024 ICLR
Layerwise linear mode connectivity 2024 ICLR
ZipIt! Merging Models from Different Tasks without Training 2024 ICLR
Proving linear mode connectivity of neural networks via optimal transport 2024 AISTATS
Training-Free Pretrained Model Merging 2024 CVPR
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering 2024 Arxiv Llama2-7b, Llama2-13b
C2M3: Cycle-Consistent Multi Model Merging 2024 NeurIPS
PLeaS--Merging Models with Permutations and Least Squares 2024 Arxiv
Rethink Model Re-Basin and the Linear Mode Connectivity 2024 Arxiv
Git Re-Basin: Merging Models modulo Permutation Symmetries 2023 ICLR
Re-basin via implicit Sinkhorn differentiation 2023 CVPR
Plateau in Monotonic Linear Interpolation--A "Biased" View of Loss Landscape for Deep Networks 2023 ICLR
Linear Mode Connectivity of Deep Neural Networks via Permutation Invariance and Renormalization 2023 ICLR
REPAIR: REnormalizing Permuted Activations for Interpolation Repair 2023 ICLR
Going beyond linear mode connectivity: The layerwise linear feature connectivity 2023 NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks 2022 ICLR
What can linear interpolation of neural network loss landscapes tell us? 2022 ICML
Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling 2021 ICML
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes 2021 ICML
Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances 2021 ICML
Linear Mode Connectivity and the Lottery Ticket Hypothesis 2020 ICML
Optimizing mode connectivity via neuron alignment 2020 NeurIPS
Model fusion via optimal transport 2020 NeurIPS
Uniform convergence may be unable to explain generalization in deep learning 2019 NeurIPS
Explaining landscape connectivity of low-cost solutions for multilayer nets 2019 NeurIPS
Essentially no barriers in neural network energy landscape 2018 ICML
Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs 2018 NeurIPS

During Merging Methods

Basic Merging Methods

Paper Title Year Conference/Journal Remark
Composing parameter-efficient modules with arithmetic operation 2023 NeurIPS
Editing models with task arithmetic 2023 ICLR
Model fusion via optimal transport 2020 NeurIPS
Weight averaging for neural networks and local resampling schemes 1996 AAAI Workshop

Weighted-based Merging Methods

Paper Title Year Conference/Journal Remark
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging 2024 Arxiv
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation 2024 Arxiv shisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling 2024 Arxiv
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic 2024 Arxiv LLaMA-2-7B, Mistral-7B, LLaMA-2-13B
Checkpoint Merging via Bayesian Optimization in LLM Pretraining 2024 Arxiv Baichuan2-7B, Deepseek 7B
Arceeā€™s MergeKit: A Toolkit for Merging Large Language Models 2024 Arxiv Llama2-7B-Chat, Meditron-7B
Evolutionary optimization of model merging recipes 2024 Arxiv shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts 2024 ACL
AdaMerging: Adaptive Model Merging for Multi-Task Learning 2024 ICLR
Model Merging by Uncertainty-Based Gradient Matching 2024 ICLR
Merging by Matching Models in Task Subspaces 2024 TMLR
Fisher Mask Nodes for Language Model Merging 2024 LREC-COLING
Erasure Coded Neural Network Inference via Fisher Averaging 2024 ISIT
Dataless Knowledge Fusion by Merging Weights of Language Models 2023 ICLR
Merging models with fisher-weighted averaging 2022 NeurIPS

Subspace-based Merging Method

Paper Title Year Conference/Journal Remark
Parameter Competition Balancing for Model Merging 2024 NeurIPS Llama-2-7b
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch 2024 ICML WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Localizing Task Information for Improved Model Merging and Compression 2024 ICML
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging 2024 ICLR
NegMerge: Consensual Weight Negation for Strong Machine Unlearning 2024 Arxiv
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic 2024 Arxiv
Activated Parameter Locating via Causal Intervention for Model Merging 2024 Arxiv Llama-2-chat-7B
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning 2024 Arxiv Mistral-7B-v0.1, Llama-3-8B, Neurotic-7B, MoMo-70B
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling 2024 Arxiv Llama-2-13b-code-alpaca, WizardLM, Wizard-Math, WizardCoder-Python
EMR-Merging: Tuning-Free High-Performance Model Merging 2024 NeurIPS
DPPA: Pruning Method for Large Language Model to Model Merging 2024 Arxiv LLaMa 2
Model breadcrumbs: Scaling multi-task model merging with sparse masks 2023 Arxiv
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion 2023 Arxiv
Effective and ParameterEfficient Reusing Fine-Tuned Models 2023 Openreview
Resolving Interference When Merging Models 2023 NeurIPS
Task-Specific Skill Localization in Fine-tuned Language Model 2023 ICML

Routing-based Merging Methods

Paper Title Year Conference/Journal Remark
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging 2024 Arxiv
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts 2024 ICML
Learning to Route Among Specialized Experts for Zero-Shot Generalization 2024 ICML
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy 2024 ICLR
Soft merging of experts with adaptive routing 2024 TMLR
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models 2024 Arxiv Mistral-7B-v0.1, MetaMath-Mistral-7B, dolphin-2.1-mistral-7b, speechless-code-mistral-7b-v1.0
Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging 2024 NeurIPS Qwen-14B
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts 2024 Arxiv Gemma-7B, LLaMA-2 7B & 13B, Mistral 7B, LLaMA-3 8B
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion 2024 Arxiv
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints 2023 ICLR

Post-calibration based Methods

Paper Title Year Conference/Journal Remark
SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery 2024 Arxiv
Representation Surgery for Multi-Task Model Merging 2024 ICML

Other Merging Methods

Paper Title Year Conference/Journal Remark
ATM: Improving Model Merging by Alternating Tuning and Merging 2024 Arxiv
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models 2024 Arxiv Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B
Weight Scope Alignment: A Frustratingly Easy Method for Model Merging 2024 Arxiv
Itā€™s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization 2024 Arxiv Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling 2023 Arxiv SOLAR 10.7B, SOLAR 10.7B-Instruct

Theories and Analysis of Model Merging

Paper Title Year Conference/Journal Remark
A Unified Analysis for Finite Weight Averaging 2024 Arxiv
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average 2024 Arxiv
On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm 2024 ICML
Diverse weight averaging for out-of-distribution generalization 2022 NeurIPS
Ensemble of averages: Improving model selection and boosting performance in domain generalization 2022 NeurIPS
The role of permutation invariance in linear mode connectivity of neural networks 2022 ICLR
Swad: Domain generalization by seeking flat minima 2021 NeurIPS
Linear Mode Connectivity and the Lottery Ticket Hypothesis 2020 ICML
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes 2020 ICLR
Optimizing mode connectivity via neuron alignment 2020 NeurIPS
Uniform convergence may be unable to explain generalization in deep learning 2019 NeurIPS
Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification 2018 JMLR
Iterate averaging as regularization for stochastic gradient descent 2018 Arxiv
Essentially no barriers in neural network energy landscape 2018 ICML
Averaging weights leads to wider optima and better generalization 2018 UAI
Train faster, generalize better: Stability of stochastic gradient descent 2016 ICML

Application of Model Merging in Foundation Models

Model Merging

Model Merging in Large Language Model

Human Preference Alignment for LLMs

Paper Title Year Conference/Journal Remark
Baichuan Alignment Technical Report 2024 Arxiv Qwen2-Nova-72B, Llama3-PBM-Nova-70B
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning 2024 Arxiv
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging 2024 Arxiv MetaMath-7B, MAmmoTH-7B, LLaMA2-7B
PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning 2024 Arxiv Mistral-7B-v0.1, Llama-3-8B
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch 2024 Arxiv Mistral-0.2-7B-Instruct, LLaMA-3-8B-Instruct, OpenBioLLM-8B, MAmmoTH2-7B, WizardMath-1.1-7B
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations 2024 Arxiv llama2-7b-chat-hf, mistral-7b-instruct-v0.2, WIZARDMATH-7B, Llama Math, Llama-2-7b-evolcodealpaca
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction 2024 Arxiv Llama-2-7b
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment 2024 Arxiv Qwen1.5-7B, LLaMa3-8B
A safety realignment framework via subspace-oriented model fusion for large language models 2024 Arxiv WizardLM-7B
Weak-to-strong extrapolation expedites alignment 2024 Arxiv zephyr-7b, starling-7b, snorkel-7b, llama3-8b, internlm2-7b, internlm2-20b, tulu-2-dpo-7b, tulu-2-dpo-13b, tulu-2-dpo-70b
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic 2024 Arxiv Llama-2-7BChat
Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards 2023 NeurIPS LLaMA-7b
Personalized soups: Personalized large language model alignment via post-hoc parameter merging 2023 Arxiv

Detoxification of LLMs

Paper Title Year Conference/Journal Remark
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation 2024 AAAI LLaMA-7B
Mitigating Social Biases in Language Models through Unlearning 2024 Arxiv LLaMA-2 7B
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models 2024 Arxiv Llama-2-7B, Llama-2-chat-7B, Vicuna-7B, Llama-2-13B
Composing Parameter-Efficient Modules with Arithmetic Operation 2023 NeurIPS
Editing models with task arithmetic 2023 ICLR
Elastic Weight Removal for Faithful and Abstractive Dialogue Generation 2023 Arxiv

Knowledge Unlearning of LLMs

Paper Title Year Conference/Journal Remark
NegMerge: Consensual Weight Negation for Strong Machine Unlearning 2024 Arxiv
Strong Copyright Protection for Language Models via Adaptive Model Fusion 2024 ICML LLaMa2 7B, StarCoder 7B
Avoiding Copyright Infringement via Machine Unlearning 2024 Arxiv Llama3-8B
Towards Safer Large Language Models through Machine Unlearning 2024 ACL LLAMA2-7B, LLAMA2-13B
Editing models with task arithmetic 2023 ICLR
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Model 2023 Arxiv LLAMA2-7B, LLAMA-7B, BLOOM-7B
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion 2023 Arxiv

Faster Training of LLMs

Paper Title Year Conference/Journal Remark
DEM: Distribution Edited Model for Training with Mixed Data Distributions 2024 Arxiv OpenLLaMA 7B and 13B
Checkpoint Merging via Bayesian Optimization in LLM Pretraining 2024 Arxiv Baichuan2-220B, Baichuan2-440B, Baichuan2-660B, Baichuan2-1540B, Baichuan2-1760B, Baichuan2-1980B, Baichuan2-2200B, Baichuan2-2420B, DeepSeek-1400B, DeepSeek-1600B, DeepSeek-1800B, DeepSeek-2000B
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning 2023 ACL
Early Weight Averaging meets High Learning Rates for LLM Pre-training 2023 NeurIPS Workshop
Stop wasting my time! saving days of imagenet and bert training with latest weight averaging 2022 NeurIPS Workshop
Fusing finetuned models for better pretraining 2022 Arxiv

Combine the Capabilities of Expert LLMs

Paper Title Year Conference/Journal Remark
Agent Skill Acquisition for Large Language Models via CycleQD 2024 Arxiv Llama3-8B-Instruct
Collaboratively adding new knowledge to an LLM 2024 Arxiv Meta-Llama-3-8B
Unconstrained Model Merging for Enhanced LLM Reasoning 2024 Arxiv CodeLlama-7B-Ins, CodeLlama-70B-Ins, Deepseek-Coder-Ins-v1.5, Qwen2.5-Math-7B-Ins, WizardMath-7B-V1.1, OpenMath-Mistral 7B, MetaMath-7B, MetaMath-70B
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks 2024 Arxiv Llama-7b, Llama2-7b-chat
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging 2024 Arxiv Llama 2 7B
Exploring Model Kinship for Merging Large Language Models 2024 Arxiv Mistral-7B, Mistral-7b-instruct-v0.2, MetaMath-mistral-7b, Open-chat-3.5-1210
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation 2024 Arxiv shisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models 2024 Arxiv LLAMA 3.1 8B
What Matters for Model Merging at Scale? 2024 Arxiv PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B)
HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models 2024 Arxiv Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging 2024 Arxiv CodeLlama 7B
Itā€™s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization 2024 Arxiv Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
Knowledge Fusion By Evolving Weights of Language Models 2024 ACL
LLM Merging: Building LLMs Efficiently through Merging 2024 NeurIPS 2024 Competition Track LLaMA-7B, Mistral-7B, Gemma-7B
Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement 2024 Arxiv Qwen1.5-7B, Qwen1.5-Chat-7B, Sailor-7B, Qwen1.5-14B, Qwen1.5-Chat-14B, Sailor-14B, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca
Itā€™s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization 2024 Arxiv Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic 2024 Arxiv LLaMA-2-7B, Mistral-7B, LLaMA-2-13B
PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models 2024 Arxiv Mistral-Instruct-7B, Mixtral-Instruct-8x7B
Knowledge fusion of large language models 2024 ICLR Llama-2 7B, OpenLLaMA 7B, MPT 7B
Language models are super mario: Absorbing abilities from homologous models as a free lunch 2024 ICML WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Controlled Text Generation via Language Model Arithmetic 2024 ICML MPT-7B, Pythia-12B, Llama-2-Chat-13B
Evolutionary optimization of model merging recipes 2024 Arxiv shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM 2024 Arxiv Llama-2 7B
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report 2024 Arxiv NH2-Mixtral-8x7B, NH2-Solar-10.7B, OpenChat-3.5-7B

Model Merging in Multimodal Large Language Models

Model Merging for Multimodal Fusion

Paper Title Year Conference/Journal Remark
Jointly training large autoregressive multimodal models 2024 ICLR
Model Composition for Multimodal Large Language Models 2024 ACL Vicuna-7B-v1.5
Ļ€-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation 2023 ICML
An Empirical Study of Multimodal Model Merging 2023 EMNLP
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks 2023 TMLR

Model Merging for Cross-Modal Knowledge Transfer

Paper Title Year Conference/Journal Remark
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification 2024 ICASSP Workshop

Model Merging in Image Generative Models

Style Mixing in Generative Models

Paper Title Year Conference/Journal Remark
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models 2024 Arxiv
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models 2024 Arxiv
MoLE: Mixture of LoRA Experts 2024 ICLR
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models 2024 Arxiv
Multi-LoRA Composition for Image Generation 2024 Arxiv
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models 2023 NeurIPS
Merging loras 2023 (github)
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs 2023 Arxiv
GAN Cocktail: mixing GANs without dataset access 2022 ECCV

Reducing Training Cost of Generative Models

Paper Title Year Conference/Journal Remark
Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better 2024 Arxiv
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA 2024 Arxiv

Enhancing the Faithfulness (or Generation Quality) of Diffusion Models

Paper Title Year Conference/Journal Remark
Decouple-Then-Merge: Towards Better Training for Diffusion Models 2024 Arxiv
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data 2024 Arxiv

Application of Model Merging in Different Machine Learning Subfields

Model Merging

Model Merging in Continual Learning

Model Merging to Mitigate Catastrophic Forgetting

Paper Title Year Conference/Journal Remark
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging 2024 Arxiv
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models 2024 ICML InstructBLIP (Vicuna-7B), LLaVA-1.5 (Vicuna7B)
Adaptive Discovering and Merging for Incremental Novel Class Discovery 2024 AAAI
MagMax: Leveraging Model Merging for Seamless Continual Learning 2024 ECCV
Lm-cocktail: Resilient tuning of language models via model merging 2024 ACL Findings Llama-2-chat-7b
Backward Compatibility During Data Updates by Weight Interpolation 2024 EACL
Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models 2024 Arxiv
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging 2024 Arxiv MISTRAL-7B, LLAMA-3-8B
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation 2024 Arxiv Llama3-70B
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs 2024 Arxiv Mistral-7B, Llama-3-8B
WARP: On the Benefits of Weight Averaged Rewarded Policies 2024 Arxiv Gemma-7B
A Second-Order perspective on Compositionality and Incremental Learning 2024 Arxiv
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images 2024 Arxiv
DAM: Dynamic Adapter Merging for Continual Video QA Learning 2024 Arxiv
Task-Specific Skill Localization in Fine-tuned Language Model 2023 ICML
Tangent model composition for ensembling and continual fine-tuning 2023 ICCV
A Unified Continual Learning Framework with General Parameter-Efficient Tuning 2023 ICCV
Task Arithmetic with LoRA for Continual Learning 2023 NeurIPS Workshop
Mitigating the Alignment Tax of RLHF 2023 Arxiv Mistral-7B
Robust fine-tuning of zero-shot models 2022 CVPR

Model Merging in Multi-Task/Multi-Objective/Multi-Domain/Auxiliary Learning

Model Merging for Knowledge Transfer in Multi-Task Learning

Paper Title Year Conference/Journal Remark
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging 2024 Arxiv
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning 2024 Arxiv Aya 23 8B
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks 2024 Arxiv
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer 2024 Arxiv
Evolutionary optimization of model merging recipes 2024 Arxiv shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch 2024 ICML WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B
Representation Surgery for Multi-Task Model Merging 2024 ICML
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts 2024 ICML
ZipIt! Merging Models from Different Tasks without Training 2024 ICLR
AdaMerging: Adaptive Model Merging for Multi-Task Learning 2024 ICLR
Resolving Interference When Merging Models 2023 NeurIPS
Editing models with task arithmetic 2023 ICLR

Model Merging for Knowledge Transfer in Multi-Objective Optimization

Paper Title Year Conference/Journal Remark
You Only Merge Once: Learning the Pareto Set of Preference-Aware Model Merging 2024 Arxiv
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion 2024 Arxiv
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation 2024 Arxiv Llama3-8B

Model Merging for Knowledge Transfer in Multi-Domain Learning

Paper Title Year Conference/Journal Remark
DEM: Distribution Edited Model for Training with Mixed Data Distributions 2024 Arxiv OpenLLaMA-7B, OpenLLaMA-13B
Merging Vision Transformers from Different Tasks and Domains 2023 Arxiv

Model Merging for Knowledge Transfer in Auxiliary Learning

Paper Title Year Conference/Journal Remark
ForkMerge: Mitigating Negative Transfer in Auxiliary-Task Learning 2023 NeurIPS

Model Merging in Out-of-Distribution/Domain Generalization

Model Merging for Better Out-of-Distribution Generalization

Paper Title Year Conference/Journal Remark
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation 2024 NeurIPS 2024 Workshop
Mitigating Training Imbalance in LLM Fine-Tuning via Selective Parameter Merging 2024 Arxiv Llama-2-7b
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models 2024 Arxiv
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging 2024 ICLR
Warm: On the benefits of weight averaged reward models 2024 ICML
Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy 2024 ECCV
Adaptive Stochastic Weight Averaging 2024 JMLR
Population parameter averaging (papa) 2024 TMLR
WARP: On the Benefits of Weight Averaged Rewarded Policies 2024 Arxiv Mistral 7B, Mixtral 8x7B
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average 2024 Arxiv
Model Stock: All we need is just a few fine-tuned models 2024 Arxiv
Lookaround Optimizer: š‘˜ steps around, 1 step average 2023 NeurIPS
Model ratatouille: Recycling diverse models for out-of-distribution generalization 2023 ICML
Trainable Weight Averaging: Efficient Training by Optimizing Historical Solutions 2023 ICLR
Lookaround Optimizer: k steps around, 1 step average 2023 NeurIPS
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models 2023 EACL
Dart: Diversify aggregate-repeat training improves generalization of neural networks 2023 CVPR
When do flat minima optimizers work? 2022 NeurIPS
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time 2022 ICML
Diverse weight averaging for out-of-distribution generalization 2022 NeurIPS
Robust fine-tuning of zero-shot models 2022 CVPR
Neural networks with late-phase weights 2021 ICLR
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well 2020 ICLR
SWALP: Stochastic weight averaging in low precision training 2019 ICML
Averaging weights leads to wider optima and better generalization 2018 UAI
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results 2017 NeurIPS

Model Merging for Better Domain Generalization or Domain Adaptation

Paper Title Year Conference/Journal Remark
Realistic Evaluation of Model Merging for Compositional Generalization 2024 Arxiv
Layer-wise Model Merging for Unsupervised Domain Adaptation in Segmentation Tasks 2024 Arxiv
Training-Free Model Merging for Multi-target Domain Adaptation 2024 Arxiv
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation 2024 Arxiv Llama3-70B
Ensemble of averages: Improving model selection and boosting performance in domain generalization 2022 NeurIPS
Swad: Domain generalization by seeking flat minima 2021 NeurIPS

Model Merging in Federated Learning

Model Merging for Local Knowledge Aggregation

Paper Title Year Conference/Journal Remark
FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion 2024 Arxiv
Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning 2024 Arxiv
Closed-form merging of parameter-efficient modules for Federated Continual Learning 2024 Arxiv
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024 CVPR
FedFisher: Leveraging Fisher Information for One-Shot Federated Learning 2024 AISTATS
lo-fi: distributed fine-tuning without communication 2023 TMLR
Revisiting Weighted Aggregation in Federated Learning with Neural Networks 2023 ICML
Deep neural network fusion via graph matching with applications to model ensemble and federated learning 2022 ICML
Federated Learning with Matched Averaging 2020 ICLR
Tackling the objective inconsistency problem in heterogeneous federated optimization 2020 NeurIPS
Model fusion via optimal transport 2020 NeurIPS
Bayesian nonparametric federated learning of neural networks 2019 ICML
Learning private neural language modeling with attentive aggregation 2019 IJCNN
Communication-Efficient Learning of Deep Networks from Decentralized Data 2017 AISTATS

Model Merging in Zero-shot/Few-shot Learning

Model Merging for Cross-task Generalization in Zero-shot Learning

Paper Title Year Conference/Journal Remark
Retraining-Free Merging of Sparse Mixture-of-Experts via Hierarchical Clustering 2024 Arxiv Qwen 60x2.7B, Qwen 45x2.7B, Qwen 30x2.7B, Mixtral 8x7B, Mixtral 6x7B, Mixtral 4x7B
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models 2024 Arxiv LLAMA 3.1 8B
Learning to Route Among Specialized Experts for Zero-Shot Generalization 2024 ICML
Towards Modular LLMs by Building and Reusing a Library of LoRAs 2024 ICML Mistral-7B
Chat Vector: A Simple Approach to Equip LLMs With New Language Chat Capabilities 2024 ACL LLaMA-2 13B, Chinese-LLaMA-13B, Chinese-Alpaca-13B, Mistral-7B, llama-2-ko-7b
Unlocking the Potential of Model Merging for Low-Resource Languages 2024 Arxiv Llama-2-7B
Diffusion Soup: Model Merging for Text-to-Image Diffusion Models 2024 Arxiv
No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement 2024 Arxiv
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models 2024 Arxiv
AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging 2024 Arxiv Llama2-7b
Model Composition for Multimodal Large Language Models 2024 Arxiv Vicuna-7B-v1.5
Exploring the Benefits of Training Expert Language Models over Instruction Tuning 2023 ICML
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization 2023 Arxiv Llama-2-7b
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization 2023 Arxiv PaLM 2-S

Model Merging for Cross-task Generalization in Few-shot Learning

Paper Title Year Conference/Journal Remark
LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks 2024 ACL Llama-2- 7B
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition 2024 COLM Llama-2-7B, Llama-2-13B
LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild 2024 ACL
Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? 2024 Arxiv
MerA: Merging pretrained adapters for few-shot learning 2023 Arxiv

Model Merging in Adversarial Learning

Model Merging as an Attack

Paper Title Year Conference/Journal Remark
BadMerging: Backdoor Attacks Against Model Merging 2024 CCS
LoRA-as-an-Attack! Piercing LLM Safety Under The Share-and-Play Scenario 2024 ACL Llama-2-7B

Model Merging as a Defense or Intellectual Property Protection

Paper Title Year Conference/Journal Remark
Hyper Adversarial Tuning for Boosting Adversarial Robustness of Pretrained Large Vision Models 2024 Arxiv
REEF: Representation Encoding Fingerprints for Large Language Models 2024 Arxiv Evollm-jp-7b, Shisa-gamma-7b-v1, Wizardmath-7b-1.1, Abel-7b-002, Llama-2-7b, Openllama-2-7b, Mpt-7b, Internlm2-chat-20b, Mixtral-8x7b-instruct, Qwen-1.5-chat-72b
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace 2024 Arxiv
MergePrint: Robust Fingerprinting against Merging Large Language Models 2024 Arxiv LLaMA-2-7B, WizardMath-7B-V1.0, LLaMA-2-7B-CHAT
Hereā€™s a Free Lunch: Sanitizing Backdoored Models with Model Merge 2024 ACL
Merging Improves Self-Critique Against Jailbreak Attacks 2024 Arxiv Mistral-7B, Mixtral-8x7B
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging 2024 Arxiv LLaMA-2-7B, LLaMA-2-7B-CHAT, WizardMath-7B-V1.0
Revisiting adapters with adversarial training 2023 ICLR
Seasoning model soups for robustness to adversarial and natural distribution shifts 2023 CVPR

Other Applications

Paper Title Year Conference/Journal Remark
Is Multiple Object Tracking a Matter of Specialization? 2024 NeurIPS
Tracking Universal Features Through Fine-Tuning and Model Merging 2024 Arxiv
HM3: Heterogeneous Multi-Class Model Merging 2024 Arxiv
Emotion Arithmetic: Emotional Speech Synthesis via Weight Space Interpolation 2024 Interspeech
Erasure Coded Neural Network Inference via Fisher Averaging 2024 Arxiv
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair 2024 Arxiv
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks 2024 Arxiv Llama2-7B, Llama2-13B-chat, Mistral-7B-instruct
Scaling Up Personalized Image Aesthetic Assessment via Task Vector Customization 2024 Arxiv
An Attribute Interpolation Method in Speech Synthesis by Model Merging 2024 Arxiv
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition 2024 Arxiv
MedMerge: Merging Models for Effective Transfer Learning to Medical Imaging Tasks 2024 Arxiv
Experts Weights Averaging: A New General Training Scheme for Vision Transformers 2023 Arxiv
One Student Knows All Experts Know: From Sparse to Dense 2022 Arxiv

Star History

Star History Chart


Contact

We welcome all researchers to contribute to this repository 'model merging in foundation models or machine learning'.

If you have a related paper that was not added to the library, please contact us.

Email: [email protected] / [email protected]