v1.4.0

Latest

Latest

zhudotexe released this 21 Feb 16:09

db9ebeb

Mainly improvements to the llama.cpp engine in this release.

Improvements

Update the LlamaCppEngine to not use the Llama 2 prompt pipeline by default. Prompt pipelines must now be explicitly passed.
The LlamaCppEngine will now automatically download additional GGUF shards when a sharded model is given.
Added ChatTemplatePromptPipeline.from_pretrained to create a prompt pipeline from the chat template of any model on the HF Hub, by ID.
Added examples and documentation for using DeepSeek-R1 (quantized).

Fixes

chat_in_terminal_async no longer blocks the asyncio event loop when waiting for input from the terminal.
Fixed the LlamaCppEngine not passing functions to the provided prompt pipeline.

Assets 2