Skip to content

v1.4.0

Latest
Compare
Choose a tag to compare
@zhudotexe zhudotexe released this 21 Feb 16:09

Mainly improvements to the llama.cpp engine in this release.

Improvements

  • Update the LlamaCppEngine to not use the Llama 2 prompt pipeline by default. Prompt pipelines must now be explicitly passed.
  • The LlamaCppEngine will now automatically download additional GGUF shards when a sharded model is given.
  • Added ChatTemplatePromptPipeline.from_pretrained to create a prompt pipeline from the chat template of any model on the HF Hub, by ID.
  • Added examples and documentation for using DeepSeek-R1 (quantized).

Fixes

  • chat_in_terminal_async no longer blocks the asyncio event loop when waiting for input from the terminal.
  • Fixed the LlamaCppEngine not passing functions to the provided prompt pipeline.