You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, the llama.cpp integration only supports create_completion() and not create_chat_completion(). Incorporating chat completion will allow for
streaming
function calling
json constraining
Describe the solution you'd like
Proper implementation of create_chat_completion()
Describe alternatives you've considered
Ollama is built upon llama.cpp and supports Chat Completion, but there are reasons for not wanting to use Ollama if you already use llama.cpp
Additional context
The biggest reason for wanting to support this is json constraining. By allowing json constraining, output of an LLM can be structured in such a way that it can be used as input for evaluators that support (or can be modified to support) local llms through llama.cpp. Without constraining the output, parsing becomes difficult for models that are not as good at following directions.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, the llama.cpp integration only supports create_completion() and not create_chat_completion(). Incorporating chat completion will allow for
Describe the solution you'd like
Proper implementation of create_chat_completion()
Describe alternatives you've considered
Ollama is built upon llama.cpp and supports Chat Completion, but there are reasons for not wanting to use Ollama if you already use llama.cpp
Additional context
The biggest reason for wanting to support this is json constraining. By allowing json constraining, output of an LLM can be structured in such a way that it can be used as input for evaluators that support (or can be modified to support) local llms through llama.cpp. Without constraining the output, parsing becomes difficult for models that are not as good at following directions.
The text was updated successfully, but these errors were encountered: