Mainly improvements to the llama.cpp engine in this release.
Improvements
- Update the
LlamaCppEngine
to not use the Llama 2 prompt pipeline by default. Prompt pipelines must now be explicitly passed. - The
LlamaCppEngine
will now automatically download additional GGUF shards when a sharded model is given. - Added
ChatTemplatePromptPipeline.from_pretrained
to create a prompt pipeline from the chat template of any model on the HF Hub, by ID. - Added examples and documentation for using DeepSeek-R1 (quantized).
Fixes
chat_in_terminal_async
no longer blocks the asyncio event loop when waiting for input from the terminal.- Fixed the
LlamaCppEngine
not passing functions to the provided prompt pipeline.