🔥 Awesome-LLM-Ensemble
"Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"

Zhijun Chen, Jingzheng Li, Pengpeng Chen, Zhuoran Li, Kai Sun, Yuankai Luo, Qianren Mao, Dingqi Yang, Hailong Sun, Philip S. Yu

If you like our project, please give it a star ⭐ to show your support.
For this emerging topic, we hope this project can provide some reference for researchers and look forward to more interesting studies!

📣 News and Notices

🔥🔥🔥 This is a collection of papers on LLM Ensemble.
It's based on our survey paper: Harnessing Multiple Large Language Models: A Survey on LLM Ensemble.

[Always] We will try to make this list updated frequently. If you found any error or any missed/new paper, please don't hesitate to contact us.

[2025/02/19] We will release our paper on arXiv in the next few days. Stay tuned.

Contents
- 1 LLM Ensemble and Taxonomy
  - 1.1 LLM Ensemble
  - 1.2 Taxonomy
- 2 Papers
  - 2.1 Ensemble Before Inference
  - 2.2 Ensemble During Inference
  - 2.3 Ensemble After Inference
  - 2.4 Others: Benchmarks and Applications
- 3 Summarization

1. LLM Ensemble and Taxonomy

1.1 LLM Ensemble

Paper Abstract:

LLM Ensemble---which involves the comprehensive use of multiple large language models (LLMs), each aimed at handling user queries during the downstream inference, to benefit from their individual strengths---has gained substantial attention recently. The widespread availability of LLMs, coupled with their varying strengths and out-of-the-box usability, has profoundly advanced the field of LLM Ensemble. This paper presents the first systematic review of recent developments in LLM Ensemble. First, we introduce our taxonomy of LLM Ensemble and discuss several related research problems. Then, we provide a more in-depth classification of methods under the broad categories of ``ensemble-before-inference, ensemble-during-inference, ensemble-after-inference'', and review all relevant methods. Finally, we introduce related benchmarks and applications, summarize existing studies, and suggest several future research directions. A curated list of papers on LLM Ensemble is available at https://github.com/junchenzhi/Awesome-LLM-Ensemble.

1.2 Taxonomy

Figure 1: Illustration of LLM Ensemble Taxonomy. (Note that for (b) ensemble-during-inference paradigm, there is also a process-level ensemble approach that we have not represented in the figure, mainly because that this approach is instantiated by a single method.)

Figure 2: Taxonomy of All LLM Ensemble Methods.

(a) Ensemble before inference.
In essence, this approach employs a routing algorithm prior to LLM inference to allocate a specific query to the most suitable model, allowing the selected model that is specialized for the query and typically more cost-efficient inference to perform the task. Existing methods can be classified into two categories, depending on whether the router necessitates the use of pre-customized data for pre-training:
- (a,1) Pre-training router;
- (a,2) Non pre-training router.
(b) Ensemble during inference.
As the most granular form of ensemble among the three broad categories, this type of approach encompasses:
- (b,1) Token-level ensemble methods, which integrate the token-level outputs of multiple models at the finest granularity of decoding;
- (b,2) Span-level ensemble methods, which conduct ensemble at the level of a sequence fragment (e.g., a span of four words);
- (b,3) Process-level ensemble methods, which select the optimal reasoning process step-by-step within the reasoning chain for a given complex reasoning task. Note that for these ensemble-during-inference methods, the aggregated text segments will be concatenated with the previous text and fed again to models.
(c) Ensemble after inference.
These methods can be classified into two categories:
- (c,1) Non cascade methods, which perform ensemble using multiple complete responses contributed from all LLM candidates;
- (c,2) Cascade methods, which consider both performance and inference costs, progressively reasoning through a chain of LLM candidates largely sorted by model size to find the most suitable inference response.

2. Papers

2.1 Ensemble Before Inference

Figure 3: Summary analysis of the key attributes of ensemble-before-inference methods.

2.1.1 (a,1) Pre-Trained Router

Year	Title	Name	Code
2023	LLM Routing with Benchmark Datasets	-	-
2024	RouteLLM: Learning to Route LLMs with Preference Data	RouteLLM	Official
2024	Hybrid LLM: Cost-Efficient and Quality-Aware Query Routinga	Hybrid-LLM	Official
2025	LLM Bandit: Cost-Efficient LLM Generation via Preference-Conditioned Dynamic Routing	-	Official
2024	Harnessing the Power of Multiple Minds: Lessons Learned from LLM Routing	-	Official
2024	MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs	MetaLLM	Official
2024	SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models	SelectLLM	-
2024	Bench-CoE: a Framework for Collaboration of Experts from Benchmark	Bench-CoE	Official
2024	Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models	ZOOTER	-
2024	TensorOpera Router: A Multi-Model Router for Efficient LLM Inference	TO-Router	-
2024	Query Routing for Homogeneous Tools: An Instantiation in the RAG Scenario	HomoRouter	-
2023	Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling	FORC	Official
2024	Routoo: Learning to Route to Large Language Models Effectively	Routoo	-

2.1.2 (a,2) Non pre-trained router

Year	Title	Name	Code
2024	PickLLM: Context-Aware RL-Assisted Large Language Model Routing	PickLLM	-
2024	Eagle: Efficient Training-Free Router for Multi-LLM Inference	Eagle	-
2024	Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM	Blending	-

2.2 Ensemble During Inference

Figure 4: Summary analysis of the key attributes of ensemble-during-inference methods.

2.2.1 (b,1) Token-Level Ensemble

Year	Title	Name	Code
2024	Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling	GaC	Official
2024	Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration	DeePEn	Official
2024	Bridging the Gap between Different Vocabularies for LLM Ensemble	EVA	Official
2024	Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling	UniTe	-
2024	Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization	PackLLM	Official
2025	CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing	CITER	Official

2.2.2 (b,2) Span-Level Ensemble

Year	Title	Name	Code
2024	Cool-Fusion: Fuse Large Language Models without Training	Cool-Fusion	-
2024	Hit the Sweet Spot! Span-Level Ensemble for Large Language Models	SweetSpan	-
2024	SpecFuse: Ensembling Large Language Models via Next-Segment Prediction	SpecFuse	-

2.2.3 (b,3) Process-Level Ensemble

Year	Title	Name	Code
2024	Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning	LE-MCTS	-

2.3 Ensemble After Inference

Figure 5: Summary analysis of the key attributes of ensemble-during-inference methods.

2.3.1 (c,1) Non Cascade

Year	Title	Name	Code
2024	More Agents Is All You Need	Agent-Forest	Official
2024	Smoothie: Label Free Language Model Routing	SMOOTHIE	Official
2023	Getting MoRE out of Mixture of Language Model Reasoning Experts	MoRE	Official
2023	LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion	LLM-Blender	Official
2024	LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	LLM-TOPLA	Official
2024	URG: A Unified Ranking and Generation Method for Ensembling Language Models	URG	-

2.3.2 (c,2) Cascade

Year	Title	Name	Code
2023	EcoAssistant: Using LLM Assistant More Affordably and Accurately	EcoAssistant	Official
2024	Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning	-	Official
2022	Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems	Model Cascading	-
2023	Cache & Distil: Optimising API Calls to Large Language Models	neural caching	Official
2023	A Unified Approach to Routing and Cascading for LLMs	Cascade Routing	Official
2023	When Does Confidence-Based Cascade Deferral Suffice?	-	-
2023	FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance	FrugalGPT	-
2024	Language Model Cascades: Token-level uncertainty and beyond	-	-
2023	AutoMix: Automatically Mixing Language Models	AutoMix	-
2024	Dynamic Ensemble Reasoning for LLM Experts	DER	-

2.4 Others: Benchmarks and Applications

2.4.1 Benchmarks

Year	Title	Benchmark Name	Evaluation Goal	Code
2023	LLM-BLENDER: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion	MixInstruct	Performance	Official
2024	RouterBench: A Benchmark for Multi-LLM Routing System	RouterBench	Performance and cost	Official

2.4.2 Applications

Beyond the methods presented before, the concept of LLM Ensemble has found applications in a variety of more specialized tasks and domains. Here we give some examples:

Year	Title	Name	Task	Code
2023	Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs	Ensemble-Instruct	Instruction-Tuning Data Generation	Official
2024	Bayesian Calibration of Win Rate Estimation with LLM Evaluators	BWRS, Bayesian Dawid-Skene	Win Rate Estimation	Official
2024	PromptMind Team at MEDIQA-CORR 2024: Improving Clinical Text Correction with Error Categorization and LLM Ensembles	-	SQL generation	-

3 Summarization

Figure 6: Summary analysis of the key attributes of LLM Ensemble approaches.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
fig		fig
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 Awesome-LLM-Ensemble
"Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"

If you like our project, please give it a star ⭐ to show your support.
For this emerging topic, we hope this project can provide some reference for researchers and look forward to more interesting studies!

📣 News and Notices

1. LLM Ensemble and Taxonomy

1.1 LLM Ensemble

1.2 Taxonomy

2. Papers

2.1 Ensemble Before Inference

2.1.1 (a,1) Pre-Trained Router

2.1.2 (a,2) Non pre-trained router

2.2 Ensemble During Inference

2.2.1 (b,1) Token-Level Ensemble

2.2.2 (b,2) Span-Level Ensemble

2.2.3 (b,3) Process-Level Ensemble

2.3 Ensemble After Inference

2.3.1 (c,1) Non Cascade

2.3.2 (c,2) Cascade

2.4 Others: Benchmarks and Applications

2.4.1 Benchmarks

2.4.2 Applications

3 Summarization

About

Releases

Packages

License

junchenzhi/Awesome-LLM-Ensemble

Folders and files

Latest commit

History

Repository files navigation

🔥 Awesome-LLM-Ensemble "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"

If you like our project, please give it a star ⭐ to show your support. For this emerging topic, we hope this project can provide some reference for researchers and look forward to more interesting studies!

📣 News and Notices

1. LLM Ensemble and Taxonomy

1.1 LLM Ensemble

1.2 Taxonomy

2. Papers

2.1 Ensemble Before Inference

2.1.1 (a,1) Pre-Trained Router

2.1.2 (a,2) Non pre-trained router

2.2 Ensemble During Inference

2.2.1 (b,1) Token-Level Ensemble

2.2.2 (b,2) Span-Level Ensemble

2.2.3 (b,3) Process-Level Ensemble

2.3 Ensemble After Inference

2.3.1 (c,1) Non Cascade

2.3.2 (c,2) Cascade

2.4 Others: Benchmarks and Applications

2.4.1 Benchmarks

2.4.2 Applications

3 Summarization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

🔥 Awesome-LLM-Ensemble
"Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"

If you like our project, please give it a star ⭐ to show your support.
For this emerging topic, we hope this project can provide some reference for researchers and look forward to more interesting studies!

Packages