Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 810 Bytes

CHANGELOG.md

File metadata and controls

19 lines (11 loc) · 810 Bytes

Changelog

[0.0.4] - 2023-04-29

  • Fixed an interaction between the fused QKV projection and the key-value cache that caused excessive memory usage.

[0.0.3] - 2023-04-28

  • Disabled cache in ppl.py; isn't used and saves memory.
  • Added more benchmarks to README.
  • Fixed bug in generate.py; generated sequence length was not calculated correctly.

[0.0.3] - 2023-04-19

  • Added support for groupsize.
    • Note: fuse_mlp is not recommended for groupsize != -1. It is now disabled automatically during loading if the model has grouping, unless fuse_mlp is explictly set to True. This is a result of the current kernel implementation being slower than the naive implementation for groupsize != -1.
  • Added a warning if act_order and groupsize are used together. They are not compatible.