#

speculative-decoding

Here are 34 public repositories matching this topic...

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated Oct 8, 2024
Python

aphrodite-engine / aphrodite-engine

Large-scale LLM inference engine

machine-learning cuda intel api-rest lora rocm inference-engine tpu inferentia speculative-decoding

Updated Apr 25, 2025
C++

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

large-language-models llm-inference speculative-decoding

Updated Apr 19, 2025
Python

Infini-AI-Lab / Sequoia

scalable and robust tree-based speculative decoding algorithm

efficiency inference llm speculative-decoding

Updated Jan 28, 2025
Python

facebookresearch / LayerSkip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

optimization transformers early-exit llm speculative-decoding layer-drop

Updated Apr 15, 2025
Python

Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

acceleration efficiency inference llm long-context llm-inference speculative-decoding

Updated Aug 31, 2024
Python

FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

retrieval llm-inference speculative-decoding

Updated Dec 2, 2024
C

UMbreLLa

Infini-AI-Lab / UMbreLLa

LLM Inference on consumer devices

offloading llm-inference speculative-decoding

Updated Mar 17, 2025
Python

kssteven418 / BigLittleDecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference speculative-execution fast-inference llm speculative-decoding

Updated Feb 6, 2024
Python

bigai-nlco / TokenSwift

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation

inference transformer llms llm-serving llm-inference qwen speculative-decoding deepseek

Updated Mar 19, 2025
Python

romsto / Speculative-Decoding

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

fast-inference llm llm-inference speculative-decoding llm-optimization

Updated Dec 2, 2024
Python

hemingkx / SWIFT

[ICLR 2025] SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

speculative-decoding

Updated Feb 21, 2025
Python

hemingkx / SpecDec

Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)

non-autoregressive speculative-decoding

Updated Dec 9, 2023
Python

AutonomicPerfectionist / PipeInfer

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

inference llm llamacpp speculative-decoding

Updated Nov 16, 2024
C++

BaohaoLiao / RSD

Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.

efficiency reasoning decoding-algorithm large-language-models speculative-decoding process-reward-model

Updated Mar 21, 2025
Python

hyx1999 / SAM-Decoding

Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton

speculative-decoding

Updated Feb 13, 2025
Python

mscheong01 / speculative_decoding.c

minimal C implementation of speculative decoding based on llama2.c

c artificial-intelligence llm llama2 speculative-decoding

Updated Jul 15, 2024
C

jadohu / LANTERN

Official Implementation of LANTERN (ICLR'25) and LANTERN++(ICLRW-SCOPE'25)

ar-model speculative-decoding

Updated Mar 5, 2025
Python

ccs96307 / fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

acceleration inference-optimization large-language-models speculative-decoding

Updated Mar 24, 2025
Python

hsj576 / GRIFFIN

Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"

large-language-models llm-inference speculative-decoding

Updated Feb 25, 2025
Python

Improve this page

Add a description, image, and links to the speculative-decoding topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the speculative-decoding topic, visit your repo's landing page and select "manage topics."