Skip to content

Commit

Permalink
[Token Recycling] Turning Trash into Treasure: Accelerating Inference…
Browse files Browse the repository at this point in the history
… of Large Language Models with Token Recycling (#38)
  • Loading branch information
DefTruth authored Aug 19, 2024
1 parent 7994773 commit d26c24a
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2024.05|🔥[S3D] S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs(@lge.com)|[[pdf]](https://arxiv.org/pdf/2405.20314)| ⚠️ |⭐️|
|2024.06|🔥[**Parallel Decoding**] Exploring and Improving Drafts in Blockwise Parallel Decoding(@KAIST&Google Research)| [[pdf]](https://arxiv.org/pdf/2404.09221)|⚠️ |⭐️⭐️ |
|2024.07| [Multi-Token Speculative Decoding] Multi-Token Joint Speculative Decoding for Accelerating Large Language Model Inference(@University of California, etc)| [[pdf]](https://arxiv.org/pdf/2404.09221)|⚠️ |⭐️⭐️ |
|2024.08| [Token Recycling] Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling(@ir.hit.edu.cn etc) | [[pdf]](https://arxiv.org/pdf/2408.08696)|⚠️ |⭐️⭐️ |


### 📖Structured Prune/KD/Weight Sparse ([©️back👆🏻](#paperlist))
Expand Down

0 comments on commit d26c24a

Please sign in to comment.