Skip to content

v2.0

Compare
Choose a tag to compare
@DefTruth DefTruth released this 19 Aug 01:22
· 96 commits to main since this release
8c0b51d

What's Changed

  • 🔥🔥[LUT TENSOR CORE] Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration by @DefTruth in #33
  • 🔥🔥[Eigen Attention] Attention in Low-Rank Space for KV Cache Compression by @DefTruth in #34
  • KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning by @DefTruth in #35
  • Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference by @DefTruth in #36
  • 🔥[ABQ-LLM] Arbitrary-Bit Quantized Inference Acceleration for Large Language Models by @DefTruth in #37
  • [Token Recycling] Turning Trash into Treasure: Accelerating Inference… by @DefTruth in #38
  • Bump up to v2.0 by @DefTruth in #39

Full Changelog: v1.9...v2.0