Skip to content

GPTQModel v1.6.1

Latest
Compare
Choose a tag to compare
@Qubitium Qubitium released this 09 Jan 03:40
· 12 commits to main since this release
0c6452b

What's Changed

πŸŽ‰ New OpenAI api compatible end-point via model.serve(host, port).
⚑ Auto-enable flash-attention2 for inference.
πŸ› Fixed sym=False loading regression.

Full Changelog: v1.6.0...v1.6.1