Support llama.cpp chat generation #722

lbux · 2024-05-08T01:30:38Z

Is your feature request related to a problem? Please describe.
Currently, the llama.cpp integration only supports create_completion() and not create_chat_completion(). Incorporating chat completion will allow for

streaming
function calling
json constraining

Describe the solution you'd like
Proper implementation of create_chat_completion()

Describe alternatives you've considered
Ollama is built upon llama.cpp and supports Chat Completion, but there are reasons for not wanting to use Ollama if you already use llama.cpp

Additional context
The biggest reason for wanting to support this is json constraining. By allowing json constraining, output of an LLM can be structured in such a way that it can be used as input for evaluators that support (or can be modified to support) local llms through llama.cpp. Without constraining the output, parsing becomes difficult for models that are not as good at following directions.

anakin87 · 2024-05-13T12:47:29Z

done in #723

lbux added the feature request Ideas to improve an integration label May 8, 2024

lbux mentioned this issue May 8, 2024

basic implementation of llama.cpp chat generation #723

Merged

6 tasks

masci added the topic:streaming label May 10, 2024

anakin87 closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support llama.cpp chat generation #722

Support llama.cpp chat generation #722

lbux commented May 8, 2024 •

edited

Loading

anakin87 commented May 13, 2024

Support llama.cpp chat generation #722

Support llama.cpp chat generation #722

Comments

lbux commented May 8, 2024 • edited Loading

anakin87 commented May 13, 2024

lbux commented May 8, 2024 •

edited

Loading