v1.9
What's Changed
- 🔥[DynamoLLM] DynamoLLM: Designing LLM Inference Clusters for Performa… by @DefTruth in #28
- 🔥[Zero-Delay QKV Compression] Zero-Delay QKV Compression for Mitigati… by @DefTruth in #29
- 🔥[Automatic Inference Engine Tuning] Towards SLO-Optimized LLM Servin… by @DefTruth in #30
- 🔥🔥[500xCompressor] 500xCompressor: Generalized Prompt Compression for… by @DefTruth in #31
- Bump up to v1.9 by @DefTruth in #32
Full Changelog: v1.8...v1.9