Token Recycling ♻️

(Unofficial) implementation of the self-speculative LLM decoding method described in Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling.

🚀 Fast: ~2x speedup over baseline SpecBench on A100. (MAT 2.5)

🎮 Plug n Play: no training and no architecture changes.

🔮 Self-Speculative: no draft model needed.

Installation

pip install -r requirements.txt

Usage

python -m src.cli

or

from src.token_recycling import TokenRecycling

model = TokenRecycling.from_pretrained("HuggingFaceTB/SmolLM2-135M")
output = model.generate("Your prompt here")

Benchmarks

Spec-Bench
Device: a single NVIDIA A100 GPU (40GB) with 30 CPU cores
Testing environment: Pytorch 2.5.1, under CUDA 12.4
Experimental Settings: greedy decoding, FP16 precision, batch size = 1
Single run (not average of 3 runs like the official leaderboard)
Cold Start means the Token Recycling adjacency matrix was reset for each prompt.

Vicuna-7B-v1.3

Note

This only includes methods that don't require extra parameters. Other methods like EAGLE and Hydra score better (+0.01-0.21x). Refer to the official Leaderboard.

Models	Multi-turn Conversation	Translation	Summa-rization	Question Answering	Mathematical Reasoning	Retrieval-aug. Generation	#Mean Accepted Tokens	Overall
Recycling	2.24x	1.87x	2.08x	1.99x	2.50x	1.80x	2.67	2.08x
Recycling Cold Start	2.07x	1.30x	2.23x	1.70x	2.30x	1.95x	2.55	1.93x
PLD	1.56x	1.00x	2.54x	1.13x	1.55x	1.80x	1.75	1.60x
Lookahead	1.45x	1.13x	1.31x	1.20x	1.50x	1.16x	1.64	1.30x

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
README.md		README.md
figure2.png		figure2.png
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token Recycling ♻️

Installation

Usage

Benchmarks

Vicuna-7B-v1.3

About

Languages

smpanaro/token-recycling

Folders and files

Latest commit

History

Repository files navigation

Token Recycling ♻️

Installation

Usage

Benchmarks

Vicuna-7B-v1.3

About

Topics

Resources

Stars

Watchers

Forks

Languages