Skip to content

v0.4.6

Compare
Choose a tag to compare
@rasbt rasbt released this 24 Jul 15:35
· 150 commits to main since this release
3142b89

What's Changed

  • Change default top_k to 50 everywhere for consistency by @rasbt in #1592
  • Fix kv-cache clearing in Python API and Serve by @rasbt in #1596
  • dynamic KV Cache batching by @aniketmaurya in #1600
  • Remove non-used eos_id in Python API by @rasbt in #1594
  • Add quantization test and revert lightning version by @rasbt in #1605
  • Dynamically set kv-cache size in serve by @rasbt in #1602
  • Update LitData version and restore previous LitData assertions in tests by @awaelchli in #1609
  • Gemma 2: 9b and 27b versions by @Andrei-Aksionov in #1545
  • Update config hub table qlora sections by @rasbt in #1611
  • max_returned_tokens -> max_new_tokens by @rasbt in #1612
  • Add warning about pretrain preprocessing by @rasbt in #1618
  • Print warning about unsupported repo_ids by @rasbt in #1617
  • Restore capability to load alternative weights by @rasbt in #1620
  • Enable unbalanced number of layers in sequential generation by @awaelchli in #1623
  • Llama 3.1 8B and 70B checkpoints by @rasbt in #1619
  • Add Llama 3.1 405B config by @awaelchli in #1622
  • Bumb version to 0.4.6 for next release (Gemma 2 and Llama 3.1) by @rasbt in #1626

Full Changelog: v0.4.5...v0.4.6