[Token Recycling] Turning Trash into Treasure: Accelerating Inference…

… of Large Language Models with Token Recycling (#38)
DefTruth · Aug 19, 2024 · d26c24a · d26c24a
1 parent 7994773
commit d26c24a
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/README.md b/README.md
@@ -319,6 +319,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
 |2024.05|🔥[S3D] S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs(@lge.com)|[[pdf]](https://arxiv.org/pdf/2405.20314)| ⚠️ |⭐️|
 |2024.06|🔥[**Parallel Decoding**] Exploring and Improving Drafts in Blockwise Parallel Decoding(@KAIST&Google Research)| [[pdf]](https://arxiv.org/pdf/2404.09221)|⚠️ |⭐️⭐️ |
 |2024.07| [Multi-Token Speculative Decoding] Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference(@University of California, etc)| [[pdf]](https://arxiv.org/pdf/2404.09221)|⚠️ |⭐️⭐️ |
+|2024.08| [Token Recycling] Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling(@ir.hit.edu.cn etc) | [[pdf]](https://arxiv.org/pdf/2408.08696)|⚠️ |⭐️⭐️ |
 
 
 ### 📖Structured Prune/KD/Weight Sparse ([©️back👆🏻](#paperlist))