v2.6.1
What's Changed
- [From Author] Link CacheGen and CacheBlend to LMCache by @KuntaiDu in #80
- 🔥[LORC] Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy by @DefTruth in #81
- Large Language Model Performance Benchmarking on Mobile Platforms: A Thorough Evaluation by @DefTruth in #82
- [LLM Inference] LARGE LANGUAGE MODEL INFERENCE ACCELERATION: A COMPREHENSIVE HARDWARE PERSPECTIVE by @DefTruth in #83
- 🔥[PARALLELSPEC] PARALLELSPEC: PARALLEL DRAFTER FOR EFFICIENT SPECULATIVE DECODING by @DefTruth in #84
New Contributors
Full Changelog: v2.6...v2.6.1