v0.4.6
What's Changed
- Change default top_k to 50 everywhere for consistency by @rasbt in #1592
- Fix kv-cache clearing in Python API and Serve by @rasbt in #1596
- dynamic KV Cache batching by @aniketmaurya in #1600
- Remove non-used eos_id in Python API by @rasbt in #1594
- Add quantization test and revert lightning version by @rasbt in #1605
- Dynamically set kv-cache size in serve by @rasbt in #1602
- Update LitData version and restore previous LitData assertions in tests by @awaelchli in #1609
- Gemma 2:
9b
and27b
versions by @Andrei-Aksionov in #1545 - Update config hub table qlora sections by @rasbt in #1611
- max_returned_tokens -> max_new_tokens by @rasbt in #1612
- Add warning about pretrain preprocessing by @rasbt in #1618
- Print warning about unsupported repo_ids by @rasbt in #1617
- Restore capability to load alternative weights by @rasbt in #1620
- Enable unbalanced number of layers in sequential generation by @awaelchli in #1623
- Llama 3.1 8B and 70B checkpoints by @rasbt in #1619
- Add Llama 3.1 405B config by @awaelchli in #1622
- Bumb version to 0.4.6 for next release (Gemma 2 and Llama 3.1) by @rasbt in #1626
Full Changelog: v0.4.5...v0.4.6