v2.0
What's Changed
- 🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in #33
- 🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in #34
- KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in #35
- Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in #36
- 🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in #37
- [Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in #38
- Bump up to v2.0 by @DefTruth in #39
Full Changelog: v1.9...v2.0